yjit-trace-exits appends a synthetic sample for the instruction being
exited, but we didn't increment the size of the stack. Fixing this count
correctly lets us successfully generate a flamegraph from the exits.
I also replaced the line number for instructions with 0, as I don't
think the previous value had meaning.
Co-authored-by: Adam Hess <HParker@github.com>
The stackprof-format raw samples are suffixed with a count (ie. how many
times did the previously recorded side-exit repeat). Previously we were
correctly using this value to increment the :samples count, but not the
:total_samples count or edges.
This made the stackprof aggregate results incorrect (though any
flamegraphs generated should have been correct, since those only rely on
raw samples).
* YJIT: Add --yjit-pause and RubyVM::YJIT.resume
This allows booting YJIT in a suspended state. We chose to add a new
command line option as opposed to simply allowing YJIT.resume to work
without any command line option because it allows for combining with
YJIT tuning command line options. It also simpifies implementation.
Paired with Kokubun and Maxime.
* Update yjit.rb
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Early total_exits condition.
Replace Array#sort_by/first(how_many) with Array#max_by(how_many).
Replace Array#map/max with Array#max_by, match print_counters style for longest_name_length.
Tested on production workloads at Shopify for > 1 year and proven
to be quite stable. Enabling YJIT at run-time is still guarded
behind the --yjit command-line option for now.
* Enable --yjit-stats for release builds
In order for people in the real world to report information about how their application runs with YJIT, we want to expose stats without requiring rebuilding ruby. We can do this without overhead, with the exception of count ratio in yjit, since this relies on the interpreter also counting instructions.
This change exposes those stats, while not showing ratio in yjit if we are not in a stats build.
* Update yjit.rb
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
While I was working on my RubyConf talk for tracing yjit exit locations
I realized that there were exits from the dump code included in the
stats data. For example I saw 224 interp leave exits for a simple script
that should have had 1 or 2. I realized that the dump code needs to be
called _after_ the stats are generated, otherwise the dump code will be
counted in the stats exits.
I've added a `_dump_locations` method to the `at_exit` for stats
generation to ensure that it runs last. I've updated the documentation
to add a note that if you call `dump_exit_locations` directly, your
stats will include the dump code exits as well.
In a small script the speed of this feature isn't really noticeable but
on Rails it's very noticeable how slow this can be. This PR aims to
speed up two parts of the functionality.
1) The Rust exit recording code
Instead of adding all samples as we see them to the yjit_raw_samples and
yjit_line_samples, we can increment the counter on the ones we've seen
before. This will be faster on traces where we are hitting the same
stack often. In a crude measurement of booting just the active record
base test (`test/cases/base_test.rb`) we found that this improved the
speed by 1 second.
This also results in a smaller marshal dump file which sped up the test
boot time by 4 seconds with trace exits on.
2) The Ruby parsing code
Previously we were allocating new arrays using `shift` and
`each_with_index`. This change avoids allocating new arrays by using an
index. This change saves us the most amount of time, gaining 11 seconds.
Before this change the test boot time took 62 seconds, after it took 47
seconds. This is still too long but it's a step closer to faster
functionality. Next we're going to tackle allowing you to collect trace
exits for a specific instruction. There is also some potential slowness
in the GC code that I'd like to take a second look at.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>