Servo Benchmarking Report (December 2024)
Delan Azabani edited this page 2024-12-17 17:56:36 +08:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Weve analysed the runtime performance of Servo and Chromium in the cases of loading three websites, as of the versions below:

The sites were as follows:

We found that servo2 now outperforms chromium in First Paint (FP) and First Contentful Paint (FCP) for two of the three sites, up from zero in servo1.

Layout and overall rendering times for servo2 are now comparable with chromium in two of the three sites, thanks to significant improvements in its LayoutPerform and ScriptParseHTML times. We believe the times for servo2 continue to lag behind chromium for www.amazon.com due to the lack of incremental layout in Servo.

All of the data used to write this report, including the study config, is in this file.

Contents

Results

Methodology

Caution: in general, results are not comparable across reports. This is especially true for this report, where weve made significant changes to our methodology, but this would be true even if there were no changes, because some conditions of the test environment are difficult to keep consistent between two sets of measurements a month apart.

This report uses a similar basic methodology to our previous report, but weve made several changes that should overall improve the quality of our data, including several improvements to Servos trace events.

We now reboot the test machine before running benchmarks on any given day, since weve noticed that this has a significant effect on results, both reducing the times and reducing their variance. Its not entirely clear why this is the case, but we suspect the reasons may be related to how this machine is also used as a general-purpose workstation, and (prior to this change) with very long uptimes of weeks to months at a time.

Despite our ongoing work to adopt tracing-based instrumentation, all of our Servo data was based on events only emitted by the old interval profiler (--profiler-trace-path=). Weve since ported all of those events other than web metrics to tracing (#34238), with web metrics on the way too (#34373). Using tracing and Perfetto confers us some advantages over the old profiler, like structured metadata, powerful filtering, and better tooling for viewing traces, though there are currently some limitations:

  • Filters in SERVO_TRACING (#34236) cant match a span without also matching its descendants
  • tracing-perfetto does not use a monotonic clock for timestamps, and does not support backdating events
  • Like the old profiler, the tracing-based instrumentation has overhead on the order of microseconds

ScriptParseHTML (and ScriptParseXML) events are more useful now (#34273). Weve excluded time spent doing reflow and running scripts while loading a page, and included time spent parsing for document.write(). Previously our times were often unreasonably high for sites doing significant layout or script work during initial page load.

ScriptEvaluate events are also more useful now (#34286). Weve included time spent executing scripts in many situations we werent before, like setTimeout(function), DOM event listeners, module scripts, and worker scripts. Previously our times were often unreasonably low for single-page apps that do most of their script work after page load.

The report now includes scatter plots, to make it possible to see whether large standard deviations (s= in the tables below) are noise or outliers. Further changes to summary statistics and how they are presented would be useful here, like quartiles, p95/p99 values (worst 5% or 1%), or confidence intervals.

Some of the limitations of our methodology still apply in this report:

  • The test scenarios in this report only cover cold page loads, without any caching or further user interaction
  • Times are not cut off after the page is “fully loaded”, which may distort results due to layout operations after page load
  • No support for Largest Contentful Paint (LCP) or interactivity metrics other than Time to Interactive (TTI)
  • Chromiums Rasterise phase is incomplete, since it only includes Layerize events

Cold page loads and fixed wait times were impossible to fix with our old tooling. With the tooling changes weve made for this report, resolving those limitations is now within reach.

Tooling changes

In our previous report, we used a collection of shell scripts to keep track of all of the CPU configs, sites, and engines, but these scripts were pretty inflexible and a pain to reconfigure. Since then, weve improved our § Measurement and analysis procedures by replacing those scripts with a declarative “study” system.

We define our study in § Study config file, then the Rust tooling takes care of the rest. Moving this logic into Rust with a real config file has made several new features possible.

Each engine previously had to be run with the same arguments for the same amount of time, no matter the site under test. We can now set site- and engine-specific settings like the browser open time and extra engine arguments:

[engines]
"servo1" = { type = "Servo", path = "/path/to/servo1/servo" }

[sites."example.com"]
browser_open_time = 20
extra_engine_arguments.servo1 = ["--pref", "dom.svg.enabled"]

Control over the browser was previously limited to launch arguments, window management (via xdotool), and killing the process. We can now run Chromium via WebDriver, allowing us to configure things like the User-Agent and window.screen (for mobile sites), and check that specific element counts are present after the open time has elapsed:

[sites."example.com"]
user_agent = "Android"
screen_size = [320,568]
wait_for_selectors."nav a" = 3
wait_for_selectors."footer" = 1

In the future, we could also add support for running Servo via WebDriver, which would allow for some more powerful capabilities in the direction of Chromiums Web Page Replay system:

  • Exiting the browser after the page is “loaded”, rather than after a fixed time
  • Testing warm page loads after memory and/or disk caching
  • Testing other scenarios, like scrolling and interacting with the page

Test environment

Our test environment is as follows:

  • AMD 7950X (amd64)
  • NixOS 24.11.20241111.dc460ec running X11
  • Linux 6.11.7, linuxPackages_testing from NixOS (as above)
  • Servo is built with ./mach build --profile production-stripped --features tracing-perfetto
  • Chromium is google-chrome from NixOS (as above)

The workloads are run in a shell created as follows:

$ newgrp mitmproxy

$ nix-shell ~/path/to/servo/shell.nix --run zsh

$ nix-shell ~/path/to/perf-analysis-tools/shell.nix --run zsh

CPU isolation is handled in § Measurement and analysis procedures.

Measurement and analysis procedures

To ensure that the windows are kept offscreen, we use the following i3 config:

$ cat ~/.config/i3/config
for_window [instance="^servo$" class="^servo$"] floating enable
for_window [instance="^google-chrome [(]" class="^Google-chrome$"] floating enable
assign [instance="^servo$" class="^servo$"] 7
assign [instance="^google-chrome [(]" class="^Google-chrome$"] 7

To ensure that the benchmarking scripts can set up the CPU isolation automatically, we use the following sudoers(5) config:

$ cat /etc/sudoers
%wheel  ALL=(ALL:ALL)    NOPASSWD: /path/to/perf-analysis-tools/isolate-cpu-for-shell.sh

We run the benchmarks as follows:

$ cd perf-analysis-tools

$ cargo run -r -- collect studies/2024-12-11

We compute summaries for the data as follows, converting the Chromium traces from Perfetto format to JSON format as needed:

$ cargo run -r -- analyse studies/2024-12-11

We collate those summaries to generate tables and charts as follows:

$ cargo run -r -- report studies/2024-12-11 > studies/2024-12-11/report.html

Future work

So far, weve tried to answer two questions about the runtime performance of Servo and Chromium as engines.

How long does an engine take to load a page? Here we are currently limited to First Paint (FP) and First Contentful Paint (FCP), but adding DOMContentLoaded and load event times to our analysis would be a very useful next step. In the longer term, it would be good for Servo to implement some arguably more important web metrics like Largest Contentful Paint (LCP) and Total Blocking Time (TBT).

Why does the engine take that long? Here our approach has been to make a list of rendering phases we would expect to see in a browser engine, then find and/or implement tracing events to match them. But the devil is in the details: the list can be flawed, the events can be flawed, and even when the events are correct, there can be more than one reasonable way to measure something. We could complement our data with some other approaches that dont suffer from this problem to the same degree:

  • Sampling profiler and flamegraph — this is the most effective way to break down the percent of time spent doing script, layout, etc, and avoids distortion due to instrumentation overhead.

  • Process and thread times — the operating system knows exactly how much time each thread spent running, sleeping, waiting for I/O, in syscalls, etc, because the scheduler depends on this information. This is what you see when you time(1) a command. What if we could getrusage(2) the engines processes and threads at key points in time, like navigation start and page load? Then we can subtract one from the other and say “this engine took X ms to load this page, of which Y ms was spent doing actual work in userland”.

Study config file

study.toml
# How many times to run the browser in each sample.
sample_size = 30

# Command for traceconv. The example below is for NixOS.
traceconv_command = ["steam-run", "../../traceconv"]

# Command for setting up CPU isolation. Must accept the same arguments as isolate-cpu-for-shell.sh.
# isolate_cpu_command = ["true"]  # on platforms without CPU isolation support
isolate_cpu_command = ["sudo", "../../isolate-cpu-for-shell.sh"]  # on Linux

# Define your CPU configs here.
# - Syntax is `key = [list of CPUs]`
# - Dots in the key must be quoted
[cpu_configs]
4cpu = [12, 13, 14, 15]

# Define your sites here.
# - Syntax is `key = "url"`
# - Dots in the key must be quoted
# - If `url` has the root path (`/`), the trailing slash must be included
[sites]
"servo.org" = "https://servo.org/"
"zh.wikipedia.org" = "https://zh.wikipedia.org/wiki/Servo"
"www.amazon.com" = "https://www.amazon.com/dp/B07S9XZYN2"

# Sites can also have other settings, in the full table format.
# - `url` has the same meaning as the string value above
# - `browser_open_time` (optional) is in seconds
# - `user_agent` (optional) overrides the browsers default user agent
#   - Currently supported for `ChromeDriver`-type engines only
#   - For `Servo`-type engines, use `extra_engine_arguments.engine = ["--user-agent", "Android"]`
#   - For `Chromium`-type engines, use `extra_engine_arguments.engine = ["--user-agent=Android"]`
# - `screen_size` (optional) overrides the browsers reported screen size (not the viewport size!)
#   - Currently supported for `ChromeDriver`-type engines only
#   - For `Servo`-type engines, use `extra_engine_arguments.engine = ["--screen-size", "320x568"]`
#   - For `Chromium`-type engines, there is no way to do this
# - `wait_for_selectors` (optional) is a map from CSS selectors to expected element counts
#   - Currently supported for `ChromeDriver`-type engines only
#   - For `Servo`-type engines, there is no way to do this
#   - For `Chromium`-type engines, there is no way to do this
# - `extra_engine_arguments` (optional) is keyed on the engine key

# Define your engines here.
# - Syntax is `key = { type = "Servo|Chromium", path = "/path/to/browser" }`
# - Dots in the key must be quoted
# - `type` is one of the following:
#   - `Servo` uses benchmark-servo.sh
#   - `Chromium` uses benchmark-chromium.sh
#   - `ChromeDriver` uses ChromeDriver, a WebDriver-based approach
# - If `path` has no slashes, it represents a command in your PATH
# - `description` (optional) is shown in the report
[engines.servo1]
type = "Servo"
path = "/home/delan/code/servo/servo.20240812.ea5cf751696ec8c24e7303b042d534a32c2a9a24/servo"
description = 'servoshell <a href="https://github.com/servo/servo/commit/ea5cf751696ec8c24e7303b042d534a32c2a9a24">ea5cf751696ec</a> (2024-08-12) + <a href="https://github.com/servo/servo/compare/ea5cf751696ec8c24e7303b042d534a32c2a9a24...c25e4d37254e89642e91585cda1f231b34c47241">ea5cf751696ec...c25e4d37254e8</a> (<a href="https://github.com/servo/servo/pull/34569">#34569</a>)'

[engines.servo2]
type = "Servo"
path = "/home/delan/code/servo/servo.20241209.3f69ef2303dd227c49917c1691e841dca41a4ad2/servo"
description = 'servoshell <a href="https://github.com/servo/servo/commit/3f69ef2303dd227c49917c1691e841dca41a4ad2">3f69ef2303dd2</a> (2024-12-09) + <a href="https://github.com/servo/servo/commit/c61ab5bacf3da15a60045e60146f9d0ef4c636b0">c61ab5bacf3da</a> (<a href="https://github.com/servo/servo/pull/34373">#34373</a>)'

[engines.chromium]
type = "ChromeDriver"
path = "google-chrome-stable"
description = 'Google Chrome 130.0.6723.91 (Official Build), from NixOS 24.11.20241111.<a href="https://github.com/NixOS/nixpkgs/commit/dc460ec76cbff0e66e269457d7b728432263166c">dc460ec</a>'

Web metrics

FP (synthetic)

servo.org

4cpu
μ chromium 582.4ms
servo1 699.7ms
servo2 399.7ms
min chromium 333.6ms
servo1 29.63ms
servo2 364.1ms

www.amazon.com

4cpu
μ chromium 451.3ms
servo1 1.274s
servo2 1.274s
min chromium 146.1ms
servo1 902.3ms
servo2 968.8ms

zh.wikipedia.org

4cpu
μ chromium 532.0ms
servo1 1.457s
servo2 150.3ms
min chromium 212.9ms
servo1 1.251s
servo2 134.5ms

FCP (synthetic)

servo.org

4cpu
μ chromium 582.4ms
servo1 699.7ms
servo2 399.7ms
min chromium 333.6ms
servo1 29.63ms
servo2 364.1ms

www.amazon.com

4cpu
μ chromium 451.3ms
servo1 1.274s
servo2 1.274s
min chromium 146.1ms
servo1 902.3ms
servo2 968.8ms

zh.wikipedia.org

4cpu
μ chromium 532.0ms
servo1 1.457s
servo2 150.3ms
min chromium 212.9ms
servo1 1.251s
servo2 134.5ms

Raw tracing events

Compositing (real)

servo.org

4cpu
μ servo1 6.617ms
servo2 60.98ms
min servo1 3.930ms
servo2 57.05ms

www.amazon.com

4cpu
μ servo1 88.50ms
servo2 84.63ms
min servo1 84.05ms
servo2 78.24ms

zh.wikipedia.org

4cpu
μ servo1 41.66ms
servo2 42.91ms
min servo1 38.90ms
servo2 40.29ms

LayoutPerform (real)

servo.org

4cpu
μ servo1 502.7ms
servo2 27.42ms
min servo1 6.096ms
servo2 22.01ms

www.amazon.com

4cpu
μ servo1 6.372s
servo2 6.482s
min servo1 3.923s
servo2 5.938s

zh.wikipedia.org

4cpu
μ servo1 1.473s
servo2 243.0ms
min servo1 1.296s
servo2 187.0ms

ScriptEvaluate (real)

servo.org

4cpu
μ servo1 10.61ms
servo2 11.77ms
min servo1 0.000ns
servo2 10.50ms

www.amazon.com

4cpu
μ servo1 5.522s
servo2 7.066s
min servo1 3.630s
servo2 5.576s

zh.wikipedia.org

4cpu
μ servo1 13.69ms
servo2 13.66ms
min servo1 12.97ms
servo2 13.03ms

ScriptParseHTML (real)

servo.org

4cpu
μ servo1 40.98ms
servo2 7.457ms
min servo1 234.5μs
servo2 5.882ms

www.amazon.com

4cpu
μ servo1 862.0ms
servo2 550.3ms
min servo1 762.6ms
servo2 398.9ms

zh.wikipedia.org

4cpu
μ servo1 9.456ms
servo2 9.829ms
min servo1 9.162ms
servo2 9.305ms

EvaluateScript (real)

servo.org

4cpu
μ chromium 9.802ms
min chromium 7.994ms

www.amazon.com

4cpu
μ chromium 152.8ms
min chromium 138.1ms

zh.wikipedia.org

4cpu
μ chromium 9.455ms
min chromium 8.079ms

FunctionCall (real)

servo.org

4cpu
μ chromium 1.312ms
min chromium 858.0μs

www.amazon.com

4cpu
μ chromium 732.3ms
min chromium 610.1ms

zh.wikipedia.org

4cpu
μ chromium 105.3ms
min chromium 99.58ms

Layerize (real)

servo.org

4cpu
μ chromium 488.8μs
min chromium 337.0μs

www.amazon.com

4cpu
μ chromium 12.24ms
min chromium 10.50ms

zh.wikipedia.org

4cpu
μ chromium 3.140ms
min chromium 2.424ms

Layout (real)

servo.org

4cpu
μ chromium 49.02ms
min chromium 41.98ms

www.amazon.com

4cpu
μ chromium 145.0ms
min chromium 126.6ms

zh.wikipedia.org

4cpu
μ chromium 135.4ms
min chromium 123.0ms

Paint (real)

servo.org

4cpu
μ chromium 1.669ms
min chromium 1.185ms

www.amazon.com

4cpu
μ chromium 28.54ms
min chromium 21.34ms

zh.wikipedia.org

4cpu
μ chromium 14.72ms
min chromium 12.87ms

ParseHTML (real)

servo.org

4cpu
μ chromium 11.23ms
min chromium 8.923ms

www.amazon.com

4cpu
μ chromium 159.1ms
min chromium 142.0ms

zh.wikipedia.org

4cpu
μ chromium 5.900ms
min chromium 5.107ms

PrePaint (real)

servo.org

4cpu
μ chromium 1.190ms
min chromium 839.0μs

www.amazon.com

4cpu
μ chromium 28.32ms
min chromium 20.44ms

zh.wikipedia.org

4cpu
μ chromium 5.432ms
min chromium 4.411ms

TimerFire (real)

servo.org

4cpu
μ chromium 665.6μs
min chromium 453.0μs

www.amazon.com

4cpu
μ chromium 509.3ms
min chromium 464.6ms

zh.wikipedia.org

4cpu
μ chromium 26.72ms
min chromium 23.74ms

UpdateLayoutTree (real)

servo.org

4cpu
μ chromium 18.78ms
min chromium 15.29ms

www.amazon.com

4cpu
μ chromium 114.0ms
min chromium 102.2ms

zh.wikipedia.org

4cpu
μ chromium 20.97ms
min chromium 19.17ms

Parse (synthetic)

servo.org

4cpu
μ chromium 11.09ms
servo1 40.98ms
servo2 7.457ms
min chromium 8.810ms
servo1 234.5μs
servo2 5.882ms

www.amazon.com

4cpu
μ chromium 159.1ms
servo1 862.0ms
servo2 550.3ms
min chromium 142.0ms
servo1 762.6ms
servo2 398.9ms

zh.wikipedia.org

4cpu
μ chromium 5.900ms
servo1 9.456ms
servo2 9.829ms
min chromium 5.107ms
servo1 9.162ms
servo2 9.305ms

Script (synthetic)

servo.org

4cpu
μ chromium 11.13ms
servo1 10.59ms
servo2 11.75ms
min chromium 9.204ms
servo1 0.000ns
servo2 10.48ms

www.amazon.com

4cpu
μ chromium 936.1ms
servo1 5.483s
servo2 7.015s
min chromium 803.0ms
servo1 3.598s
servo2 5.554s

zh.wikipedia.org

4cpu
μ chromium 116.2ms
servo1 13.69ms
servo2 13.66ms
min chromium 109.8ms
servo1 12.97ms
servo2 13.03ms

Rendering phases model

Layout (synthetic)

servo.org

4cpu
μ chromium 70.66ms
servo1 502.7ms
servo2 27.42ms
min chromium 60.90ms
servo1 6.096ms
servo2 22.01ms

www.amazon.com

4cpu
μ chromium 315.9ms
servo1 6.372s
servo2 6.482s
min chromium 273.3ms
servo1 3.923s
servo2 5.938s

zh.wikipedia.org

4cpu
μ chromium 176.5ms
servo1 1.473s
servo2 243.0ms
min chromium 162.1ms
servo1 1.296s
servo2 187.0ms

Rasterise (synthetic)

servo.org

4cpu
μ chromium 488.8μs
servo1 6.617ms
servo2 60.98ms
min chromium 337.0μs
servo1 3.930ms
servo2 57.05ms

www.amazon.com

4cpu
μ chromium 12.24ms
servo1 88.50ms
servo2 84.63ms
min chromium 10.50ms
servo1 84.05ms
servo2 78.24ms

zh.wikipedia.org

4cpu
μ chromium 3.140ms
servo1 41.66ms
servo2 42.91ms
min chromium 2.424ms
servo1 38.90ms
servo2 40.29ms

Overall rendering time model

Renderer (synthetic)

servo.org

4cpu
μ chromium 84.89ms
servo1 520.4ms
servo2 102.8ms
min chromium 73.90ms
servo1 13.52ms
servo2 93.01ms

www.amazon.com

4cpu
μ chromium 1.145s
servo1 8.247s
servo2 8.473s
min chromium 1.015s
servo1 5.285s
servo2 8.146s

zh.wikipedia.org

4cpu
μ chromium 295.6ms
servo1 1.505s
servo2 270.4ms
min chromium 275.6ms
servo1 1.321s
servo2 216.5ms