Servo Benchmarking Report (December 2024)

archive/servo

Fork 0

mirror of https://github.com/servo/servo.git synced 2026-05-01 07:14:51 +02:00

Table of Contents

Contents

Results

Methodology

Tooling changes

Test environment
Measurement and analysis procedures
Future work
Study config file
Web metrics
Raw tracing events
Rendering phases model
Overall rendering time model

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

We’ve analysed the runtime performance of Servo and Chromium in the cases of loading three websites, as of the versions below:

perf-analysis-tools version: 9110bba27cb8efa8109cb97da0f8df03483484ae
chromium = Google Chrome 130.0.6723.91 (Official Build), from NixOS 24.11.20241111.dc460ec
servo1 = servoshell ea5cf751696ec (2024-08-12) + ea5cf751696ec...c25e4d37254e8 (#34569)
servo2 = servoshell 3f69ef2303dd2 (2024-12-09) + c61ab5bacf3da (#34373)

The sites were as follows:

servo.org = https://servo.org/
www.amazon.com = https://www.amazon.com/dp/B07S9XZYN2
zh.wikipedia.org = https://zh.wikipedia.org/wiki/Servo

We found that servo2 now outperforms chromium in First Paint (FP) and First Contentful Paint (FCP) for two of the three sites, up from zero in servo1.

Layout and overall rendering times for servo2 are now comparable with chromium in two of the three sites, thanks to significant improvements in its LayoutPerform and ScriptParseHTML times. We believe the times for servo2 continue to lag behind chromium for www.amazon.com due to the lack of incremental layout in Servo.

All of the data used to write this report, including the study config, is in this file.

Methodology
- Tooling changes
Test environment
Measurement and analysis procedures
Future work

Methodology

Caution: in general, results are not comparable across reports. This is especially true for this report, where we’ve made significant changes to our methodology, but this would be true even if there were no changes, because some conditions of the test environment are difficult to keep consistent between two sets of measurements a month apart.

This report uses a similar basic methodology to our previous report, but we’ve made several changes that should overall improve the quality of our data, including several improvements to Servo’s trace events.

We now reboot the test machine before running benchmarks on any given day, since we’ve noticed that this has a significant effect on results, both reducing the times and reducing their variance. It’s not entirely clear why this is the case, but we suspect the reasons may be related to how this machine is also used as a general-purpose workstation, and (prior to this change) with very long uptimes of weeks to months at a time.

Despite our ongoing work to adopt tracing-based instrumentation, all of our Servo data was based on events only emitted by the old interval profiler (--profiler-trace-path=). We’ve since ported all of those events other than web metrics to tracing (#34238), with web metrics on the way too (#34373). Using tracing and Perfetto confers us some advantages over the old profiler, like structured metadata, powerful filtering, and better tooling for viewing traces, though there are currently some limitations:

Filters in SERVO_TRACING (#34236) can’t match a span without also matching its descendants
tracing-perfetto does not use a monotonic clock for timestamps, and does not support backdating events
Like the old profiler, the tracing-based instrumentation has overhead on the order of microseconds

ScriptParseHTML (and ScriptParseXML) events are more useful now (#34273). We’ve excluded time spent doing reflow and running scripts while loading a page, and included time spent parsing for document.write(). Previously our times were often unreasonably high for sites doing significant layout or script work during initial page load.

ScriptEvaluate events are also more useful now (#34286). We’ve included time spent executing scripts in many situations we weren’t before, like setTimeout(function), DOM event listeners, module scripts, and worker scripts. Previously our times were often unreasonably low for single-page apps that do most of their script work after page load.

The report now includes scatter plots, to make it possible to see whether large standard deviations (s= in the tables below) are noise or outliers. Further changes to summary statistics and how they are presented would be useful here, like quartiles, p95/p99 values (worst 5% or 1%), or confidence intervals.

Some of the limitations of our methodology still apply in this report:

The test scenarios in this report only cover cold page loads, without any caching or further user interaction
Times are not cut off after the page is “fully loaded”, which may distort results due to layout operations after page load
No support for Largest Contentful Paint (LCP) or interactivity metrics other than Time to Interactive (TTI)
Chromium’s Rasterise phase is incomplete, since it only includes Layerize events

Cold page loads and fixed wait times were impossible to fix with our old tooling. With the tooling changes we’ve made for this report, resolving those limitations is now within reach.

Tooling changes

In our previous report, we used a collection of shell scripts to keep track of all of the CPU configs, sites, and engines, but these scripts were pretty inflexible and a pain to reconfigure. Since then, we’ve improved our § Measurement and analysis procedures by replacing those scripts with a declarative “study” system.

We define our study in § Study config file, then the Rust tooling takes care of the rest. Moving this logic into Rust with a real config file has made several new features possible.

Each engine previously had to be run with the same arguments for the same amount of time, no matter the site under test. We can now set site- and engine-specific settings like the browser open time and extra engine arguments:

[engines]
"servo1" = { type = "Servo", path = "/path/to/servo1/servo" }

[sites."example.com"]
browser_open_time = 20
extra_engine_arguments.servo1 = ["--pref", "dom.svg.enabled"]

Control over the browser was previously limited to launch arguments, window management (via xdotool), and killing the process. We can now run Chromium via WebDriver, allowing us to configure things like the User-Agent and window.screen (for mobile sites), and check that specific element counts are present after the open time has elapsed:

[sites."example.com"]
user_agent = "Android"
screen_size = [320,568]
wait_for_selectors."nav a" = 3
wait_for_selectors."footer" = 1

In the future, we could also add support for running Servo via WebDriver, which would allow for some more powerful capabilities in the direction of Chromium’s Web Page Replay system:

Exiting the browser after the page is “loaded”, rather than after a fixed time
Testing warm page loads after memory and/or disk caching
Testing other scenarios, like scrolling and interacting with the page

Test environment

Our test environment is as follows:

AMD 7950X (amd64)
NixOS 24.11.20241111.dc460ec running X11
Linux 6.11.7, linuxPackages_testing from NixOS (as above)
Servo is built with ./mach build --profile production-stripped --features tracing-perfetto
Chromium is google-chrome from NixOS (as above)

The workloads are run in a shell created as follows:

$ newgrp mitmproxy

$ nix-shell ~/path/to/servo/shell.nix --run zsh

$ nix-shell ~/path/to/perf-analysis-tools/shell.nix --run zsh

CPU isolation is handled in § Measurement and analysis procedures.

Measurement and analysis procedures

To ensure that the windows are kept offscreen, we use the following i3 config:

$ cat ~/.config/i3/config
for_window [instance="^servo$" class="^servo$"] floating enable
for_window [instance="^google-chrome [(]" class="^Google-chrome$"] floating enable
assign [instance="^servo$" class="^servo$"] 7
assign [instance="^google-chrome [(]" class="^Google-chrome$"] 7

To ensure that the benchmarking scripts can set up the CPU isolation automatically, we use the following sudoers(5) config:

$ cat /etc/sudoers
%wheel  ALL=(ALL:ALL)    NOPASSWD: /path/to/perf-analysis-tools/isolate-cpu-for-shell.sh

We run the benchmarks as follows:

$ cd perf-analysis-tools

$ cargo run -r -- collect studies/2024-12-11

We compute summaries for the data as follows, converting the Chromium traces from Perfetto format to JSON format as needed:

$ cargo run -r -- analyse studies/2024-12-11

We collate those summaries to generate tables and charts as follows:

$ cargo run -r -- report studies/2024-12-11 > studies/2024-12-11/report.html

Future work

So far, we’ve tried to answer two questions about the runtime performance of Servo and Chromium as engines.

How long does an engine take to load a page? Here we are currently limited to First Paint (FP) and First Contentful Paint (FCP), but adding DOMContentLoaded and load event times to our analysis would be a very useful next step. In the longer term, it would be good for Servo to implement some arguably more important web metrics like Largest Contentful Paint (LCP) and Total Blocking Time (TBT).

Why does the engine take that long? Here our approach has been to make a list of rendering phases we would expect to see in a browser engine, then find and/or implement tracing events to match them. But the devil is in the details: the list can be flawed, the events can be flawed, and even when the events are correct, there can be more than one reasonable way to measure something. We could complement our data with some other approaches that don’t suffer from this problem to the same degree:

Sampling profiler and flamegraph — this is the most effective way to break down the percent of time spent doing script, layout, etc, and avoids distortion due to instrumentation overhead.
Process and thread times — the operating system knows exactly how much time each thread spent running, sleeping, waiting for I/O, in syscalls, etc, because the scheduler depends on this information. This is what you see when you time(1) a command. What if we could getrusage (2) the engine’s processes and threads at key points in time, like navigation start and page load? Then we can subtract one from the other and say “this engine took X ms to load this page, of which Y ms was spent doing actual work in userland”.

Study config file

study.toml

# How many times to run the browser in each sample.
sample_size = 30

# Command for traceconv. The example below is for NixOS.
traceconv_command = ["steam-run", "../../traceconv"]

# Command for setting up CPU isolation. Must accept the same arguments as isolate-cpu-for-shell.sh.
# isolate_cpu_command = ["true"]  # on platforms without CPU isolation support
isolate_cpu_command = ["sudo", "../../isolate-cpu-for-shell.sh"]  # on Linux

# Define your CPU configs here.
# - Syntax is `key = [list of CPUs]`
# - Dots in the key must be quoted
[cpu_configs]
4cpu = [12, 13, 14, 15]

# Define your sites here.
# - Syntax is `key = "url"`
# - Dots in the key must be quoted
# - If `url` has the root path (`/`), the trailing slash must be included
[sites]
"servo.org" = "https://servo.org/"
"zh.wikipedia.org" = "https://zh.wikipedia.org/wiki/Servo"
"www.amazon.com" = "https://www.amazon.com/dp/B07S9XZYN2"

# Sites can also have other settings, in the full table format.
# - `url` has the same meaning as the string value above
# - `browser_open_time` (optional) is in seconds
# - `user_agent` (optional) overrides the browser’s default user agent
#   - Currently supported for `ChromeDriver`-type engines only
#   - For `Servo`-type engines, use `extra_engine_arguments.engine = ["--user-agent", "Android"]`
#   - For `Chromium`-type engines, use `extra_engine_arguments.engine = ["--user-agent=Android"]`
# - `screen_size` (optional) overrides the browser’s reported screen size (not the viewport size!)
#   - Currently supported for `ChromeDriver`-type engines only
#   - For `Servo`-type engines, use `extra_engine_arguments.engine = ["--screen-size", "320x568"]`
#   - For `Chromium`-type engines, there is no way to do this
# - `wait_for_selectors` (optional) is a map from CSS selectors to expected element counts
#   - Currently supported for `ChromeDriver`-type engines only
#   - For `Servo`-type engines, there is no way to do this
#   - For `Chromium`-type engines, there is no way to do this
# - `extra_engine_arguments` (optional) is keyed on the engine key

# Define your engines here.
# - Syntax is `key = { type = "Servo|Chromium", path = "/path/to/browser" }`
# - Dots in the key must be quoted
# - `type` is one of the following:
#   - `Servo` uses benchmark-servo.sh
#   - `Chromium` uses benchmark-chromium.sh
#   - `ChromeDriver` uses ChromeDriver, a WebDriver-based approach
# - If `path` has no slashes, it represents a command in your PATH
# - `description` (optional) is shown in the report
[engines.servo1]
type = "Servo"
path = "/home/delan/code/servo/servo.20240812.ea5cf751696ec8c24e7303b042d534a32c2a9a24/servo"
description = 'servoshell <a href="https://github.com/servo/servo/commit/ea5cf751696ec8c24e7303b042d534a32c2a9a24">ea5cf751696ec</a> (2024-08-12) + <a href="https://github.com/servo/servo/compare/ea5cf751696ec8c24e7303b042d534a32c2a9a24...c25e4d37254e89642e91585cda1f231b34c47241">ea5cf751696ec...c25e4d37254e8</a> (<a href="https://github.com/servo/servo/pull/34569">#34569</a>)'

[engines.servo2]
type = "Servo"
path = "/home/delan/code/servo/servo.20241209.3f69ef2303dd227c49917c1691e841dca41a4ad2/servo"
description = 'servoshell <a href="https://github.com/servo/servo/commit/3f69ef2303dd227c49917c1691e841dca41a4ad2">3f69ef2303dd2</a> (2024-12-09) + <a href="https://github.com/servo/servo/commit/c61ab5bacf3da15a60045e60146f9d0ef4c636b0">c61ab5bacf3da</a> (<a href="https://github.com/servo/servo/pull/34373">#34373</a>)'

[engines.chromium]
type = "ChromeDriver"
path = "google-chrome-stable"
description = 'Google Chrome 130.0.6723.91 (Official Build), from NixOS 24.11.20241111.<a href="https://github.com/NixOS/nixpkgs/commit/dc460ec76cbff0e66e269457d7b728432263166c">dc460ec</a>'

Web metrics

FP (synthetic)

servo.org

		4cpu
μ	chromium	582.4ms
	servo1	699.7ms
	servo2	399.7ms
min	chromium	333.6ms
	servo1	29.63ms
	servo2	364.1ms

www.amazon.com

		4cpu
μ	chromium	451.3ms
	servo1	1.274s
	servo2	1.274s
min	chromium	146.1ms
	servo1	902.3ms
	servo2	968.8ms

zh.wikipedia.org

		4cpu
μ	chromium	532.0ms
	servo1	1.457s
	servo2	150.3ms
min	chromium	212.9ms
	servo1	1.251s
	servo2	134.5ms

FCP (synthetic)

servo.org

		4cpu
μ	chromium	582.4ms
	servo1	699.7ms
	servo2	399.7ms
min	chromium	333.6ms
	servo1	29.63ms
	servo2	364.1ms

www.amazon.com

		4cpu
μ	chromium	451.3ms
	servo1	1.274s
	servo2	1.274s
min	chromium	146.1ms
	servo1	902.3ms
	servo2	968.8ms

zh.wikipedia.org

		4cpu
μ	chromium	532.0ms
	servo1	1.457s
	servo2	150.3ms
min	chromium	212.9ms
	servo1	1.251s
	servo2	134.5ms

Raw tracing events

Compositing (real)

servo.org

		4cpu
μ	servo1	6.617ms
μ	servo2	60.98ms
min	servo1	3.930ms
min	servo2	57.05ms

www.amazon.com

		4cpu
μ	servo1	88.50ms
μ	servo2	84.63ms
min	servo1	84.05ms
min	servo2	78.24ms

zh.wikipedia.org

		4cpu
μ	servo1	41.66ms
μ	servo2	42.91ms
min	servo1	38.90ms
min	servo2	40.29ms

LayoutPerform (real)

servo.org

		4cpu
μ	servo1	502.7ms
μ	servo2	27.42ms
min	servo1	6.096ms
min	servo2	22.01ms

www.amazon.com

		4cpu
μ	servo1	6.372s
μ	servo2	6.482s
min	servo1	3.923s
min	servo2	5.938s

zh.wikipedia.org

		4cpu
μ	servo1	1.473s
μ	servo2	243.0ms
min	servo1	1.296s
min	servo2	187.0ms

ScriptEvaluate (real)

servo.org

		4cpu
μ	servo1	10.61ms
μ	servo2	11.77ms
min	servo1	0.000ns
min	servo2	10.50ms

www.amazon.com

		4cpu
μ	servo1	5.522s
μ	servo2	7.066s
min	servo1	3.630s
min	servo2	5.576s

zh.wikipedia.org

		4cpu
μ	servo1	13.69ms
μ	servo2	13.66ms
min	servo1	12.97ms
min	servo2	13.03ms

ScriptParseHTML (real)

servo.org

		4cpu
μ	servo1	40.98ms
μ	servo2	7.457ms
min	servo1	234.5μs
min	servo2	5.882ms

www.amazon.com

		4cpu
μ	servo1	862.0ms
μ	servo2	550.3ms
min	servo1	762.6ms
min	servo2	398.9ms

zh.wikipedia.org

		4cpu
μ	servo1	9.456ms
μ	servo2	9.829ms
min	servo1	9.162ms
min	servo2	9.305ms

EvaluateScript (real)

servo.org

		4cpu
μ	chromium	9.802ms
min	chromium	7.994ms

www.amazon.com

		4cpu
μ	chromium	152.8ms
min	chromium	138.1ms

zh.wikipedia.org

		4cpu
μ	chromium	9.455ms
min	chromium	8.079ms

FunctionCall (real)

servo.org

		4cpu
μ	chromium	1.312ms
min	chromium	858.0μs

www.amazon.com

		4cpu
μ	chromium	732.3ms
min	chromium	610.1ms

zh.wikipedia.org

		4cpu
μ	chromium	105.3ms
min	chromium	99.58ms

Layerize (real)

servo.org

		4cpu
μ	chromium	488.8μs
min	chromium	337.0μs

www.amazon.com

		4cpu
μ	chromium	12.24ms
min	chromium	10.50ms

zh.wikipedia.org

		4cpu
μ	chromium	3.140ms
min	chromium	2.424ms

Layout (real)

servo.org

		4cpu
μ	chromium	49.02ms
min	chromium	41.98ms

www.amazon.com

		4cpu
μ	chromium	145.0ms
min	chromium	126.6ms

zh.wikipedia.org

		4cpu
μ	chromium	135.4ms
min	chromium	123.0ms

Paint (real)

servo.org

		4cpu
μ	chromium	1.669ms
min	chromium	1.185ms

www.amazon.com

		4cpu
μ	chromium	28.54ms
min	chromium	21.34ms

zh.wikipedia.org

		4cpu
μ	chromium	14.72ms
min	chromium	12.87ms

ParseHTML (real)

servo.org

		4cpu
μ	chromium	11.23ms
min	chromium	8.923ms

www.amazon.com

		4cpu
μ	chromium	159.1ms
min	chromium	142.0ms

zh.wikipedia.org

		4cpu
μ	chromium	5.900ms
min	chromium	5.107ms

PrePaint (real)

servo.org

		4cpu
μ	chromium	1.190ms
min	chromium	839.0μs

www.amazon.com

		4cpu
μ	chromium	28.32ms
min	chromium	20.44ms

zh.wikipedia.org

		4cpu
μ	chromium	5.432ms
min	chromium	4.411ms

TimerFire (real)

servo.org

		4cpu
μ	chromium	665.6μs
min	chromium	453.0μs

www.amazon.com

		4cpu
μ	chromium	509.3ms
min	chromium	464.6ms

zh.wikipedia.org

		4cpu
μ	chromium	26.72ms
min	chromium	23.74ms

UpdateLayoutTree (real)

servo.org

		4cpu
μ	chromium	18.78ms
min	chromium	15.29ms

www.amazon.com

		4cpu
μ	chromium	114.0ms
min	chromium	102.2ms

zh.wikipedia.org

		4cpu
μ	chromium	20.97ms
min	chromium	19.17ms

Parse (synthetic)

servo.org

		4cpu
μ	chromium	11.09ms
	servo1	40.98ms
	servo2	7.457ms
min	chromium	8.810ms
	servo1	234.5μs
	servo2	5.882ms

www.amazon.com

		4cpu
μ	chromium	159.1ms
	servo1	862.0ms
	servo2	550.3ms
min	chromium	142.0ms
	servo1	762.6ms
	servo2	398.9ms

zh.wikipedia.org

		4cpu
μ	chromium	5.900ms
	servo1	9.456ms
	servo2	9.829ms
min	chromium	5.107ms
	servo1	9.162ms
	servo2	9.305ms

Script (synthetic)

servo.org

		4cpu
μ	chromium	11.13ms
	servo1	10.59ms
	servo2	11.75ms
min	chromium	9.204ms
	servo1	0.000ns
	servo2	10.48ms

www.amazon.com

		4cpu
μ	chromium	936.1ms
	servo1	5.483s
	servo2	7.015s
min	chromium	803.0ms
	servo1	3.598s
	servo2	5.554s

zh.wikipedia.org

		4cpu
μ	chromium	116.2ms
	servo1	13.69ms
	servo2	13.66ms
min	chromium	109.8ms
	servo1	12.97ms
	servo2	13.03ms

Rendering phases model

Layout (synthetic)

servo.org

		4cpu
μ	chromium	70.66ms
	servo1	502.7ms
	servo2	27.42ms
min	chromium	60.90ms
	servo1	6.096ms
	servo2	22.01ms

www.amazon.com

		4cpu
μ	chromium	315.9ms
	servo1	6.372s
	servo2	6.482s
min	chromium	273.3ms
	servo1	3.923s
	servo2	5.938s

zh.wikipedia.org

		4cpu
μ	chromium	176.5ms
	servo1	1.473s
	servo2	243.0ms
min	chromium	162.1ms
	servo1	1.296s
	servo2	187.0ms

Rasterise (synthetic)

servo.org

		4cpu
μ	chromium	488.8μs
	servo1	6.617ms
	servo2	60.98ms
min	chromium	337.0μs
	servo1	3.930ms
	servo2	57.05ms

www.amazon.com

		4cpu
μ	chromium	12.24ms
	servo1	88.50ms
	servo2	84.63ms
min	chromium	10.50ms
	servo1	84.05ms
	servo2	78.24ms

zh.wikipedia.org

		4cpu
μ	chromium	3.140ms
	servo1	41.66ms
	servo2	42.91ms
min	chromium	2.424ms
	servo1	38.90ms
	servo2	40.29ms

Overall rendering time model

Renderer (synthetic)

servo.org

		4cpu
μ	chromium	84.89ms
	servo1	520.4ms
	servo2	102.8ms
min	chromium	73.90ms
	servo1	13.52ms
	servo2	93.01ms

www.amazon.com

		4cpu
μ	chromium	1.145s
	servo1	8.247s
	servo2	8.473s
min	chromium	1.015s
	servo1	5.285s
	servo2	8.146s

zh.wikipedia.org

		4cpu
μ	chromium	295.6ms
	servo1	1.505s
	servo2	270.4ms
min	chromium	275.6ms
	servo1	1.321s
	servo2	216.5ms