Commit graph

1114 commits

Author SHA1 Message Date
Aaron Patterson
8ac8225c50 Inline Class#new.
This commit inlines instructions for Class#new.  To make this work, we
added a new YARV instructions, `opt_new`.  `opt_new` checks whether or
not the `new` method is the default allocator method.  If it is, it
allocates the object, and pushes the instance on the stack.  If not, the
instruction jumps to the "slow path" method call instructions.

Old instructions:

```
> ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path                   <ic:0 Object>             (   1)[Li]
0002 opt_send_without_block                 <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```

New instructions:

```
> ./miniruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path                   <ic:0 Object>             (   1)[Li]
0002 putnil
0003 swap
0004 opt_new                                <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block                 <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump                                   14
0011 opt_send_without_block                 <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```

This commit speeds up basic object allocation (`Foo.new`) by 60%, but
classes that take keyword parameters see an even bigger benefit because
no hash is allocated when instantiating the object (3x to 6x faster).

Here is an example that uses `Hash.new(capacity: 0)`:

```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
  Time (mean ± σ):      1.082 s ±  0.004 s    [User: 1.074 s, System: 0.008 s]
  Range (min … max):    1.076 s …  1.088 s    10 runs

Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
  Time (mean ± σ):     627.9 ms ±   3.5 ms    [User: 622.7 ms, System: 4.8 ms]
  Range (min … max):   622.7 ms … 633.2 ms    10 runs

Summary
  ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
    1.72 ± 0.01 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```

This commit changes the backtrace for `initialize`:

```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
class Foo
  def initialize
    puts caller
  end
end

def hello
  Foo.new
end

hello
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:8:in 'Class#new'
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
aaron@tc ~/g/ruby (inline-new)> ./miniruby -v test.rb
ruby 3.5.0dev (2025-03-28T23:59:40Z inline-new c4157884e4) +PRISM [arm64-darwin24]
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
```

It also increases memory usage for calls to `new` by 122 bytes:

```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
require "objspace"

class Foo
  def initialize
    puts caller
  end
end

def hello
  Foo.new
end

puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:hello)))
aaron@tc ~/g/ruby (inline-new)> make runruby
RUBY_ON_BUG='gdb -x ./.gdbinit -p' ./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems  ./test.rb
656
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
```

Thanks to @ko1 for coming up with this idea!

Co-Authored-By: John Hawthorn <john@hawthorn.email>
2025-04-25 13:46:05 -07:00
Takashi Kokubun
ae3d6a321b Fix yjit-bindgen 2025-04-18 21:53:01 +09:00
Takashi Kokubun
ae17323a65 Move a couple of bindgen targets to ZJIT bindgen
We filed https://github.com/Shopify/zjit/pull/65 and
https://github.com/Shopify/zjit/pull/64 concurrently.
2025-04-18 21:53:00 +09:00
Alan Wu
19e8e45f69 Rust tests: Load builtins (core library written in ruby)
Key here is calling rb_call_builtin_inits(), which sticking to public
API for robustness is done by calling ruby_options().

Fixes: https://github.com/Shopify/zjit/issues/61
2025-04-18 21:53:00 +09:00
Max Bernstein
97f022b5e7 Print Ruby exception in test utils 2025-04-18 21:53:00 +09:00
Max Bernstein
ec41dffd05 Add compact Type lattice
This will be used for local type inference and potentially SCCP.
2025-04-18 21:52:59 +09:00
Takashi Kokubun
0a543daf15 Add zjit_* instructions to profile the interpreter (https://github.com/Shopify/zjit/pull/16)
* Add zjit_* instructions to profile the interpreter

* Rename FixnumPlus to FixnumAdd

* Update a comment about Invalidate

* Rename Guard to GuardType

* Rename Invalidate to PatchPoint

* Drop unneeded debug!()

* Plan on profiling the types

* Use the output of GuardType as type refined outputs
2025-04-18 21:52:59 +09:00
Alan Wu
e24be0b8d5 Upgrade bindgen, so it generates unsafe extern as 2024 expects 2025-04-18 21:52:59 +09:00
Alan Wu
4326b0cece boot_vm boots and runs 2025-04-18 21:52:57 +09:00
Alan Wu
14a4edaea6 bindgen works in --enable-zjit=dev mode. 2025-04-18 21:52:56 +09:00
Alan Wu
106b328117 make zjit-bindgen runs, but doesn't graft the right things yet 2025-04-18 21:52:56 +09:00
Takashi Kokubun
809b63c804 Fix bindgen 2025-04-18 21:52:56 +09:00
Takashi Kokubun
e6ffc141b1 Define ZJIT libs for non-gmake 2025-04-18 21:52:55 +09:00
Alan Wu
98790faae3 YJIT: Add Counter::invalidate_everything
When YJIT is forced to discard all the code, that's bad for
performance, so there should be an easy way to know about it.
2025-03-07 20:23:32 -05:00
Takashi Kokubun
bb91c303ba
YJIT: Rename get_temp_regs2() back to get_temp_regs() (#12866) 2025-03-06 10:52:49 -05:00
annichai-stripe
5085ec3ed9
Allow YJIT mem-size and call-threshold to be set at runtime via YJIT.enable() (#12505)
* first commit

* yjit.rb change

* revert formatting

* rename mem-size to exec-mem-size for correctness

* wip, move setting into rb_yjit_enable directly

* remove unused helper functions

* add in call threshold

* input validation with extensive eprintln

* delete test script

* exec-mem-size -> mem-size

* handle input validation with asserts

* add test cases related to input validation

* modify test cases

* move validation out of rs, into rb

* add comments

* remove trailing spaces

* remove logging

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>

* remove helper fn

* Update test/ruby/test_yjit.rb

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>

* trailing white space

---------

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2025-03-03 15:45:39 -05:00
Aaron Patterson
6b3a97d74b Remove undefined function from bindgen
`rb_get_iseq_body_total_calls` was removed in cd8d20cd1f, but it's still in the YJIT bindgen file.  This commit just removes it from bindgen
2025-02-16 16:37:36 -05:00
Aaron Patterson
8cafa5b8ce Only count VM instructions in YJIT stats builds
The instruction counter is slowing multi-Ractor applications.  I had
changed it to use a thread local, but using a thread local is slowing
single threaded applications.  This commit only enables the instruction
counter in YJIT stats builds until we can figure out a way to gather the
information with lower overhead.

Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
2025-02-14 14:39:35 -05:00
Alan Wu
41251fdd30 YJIT: Fix linker warnings on macOS for Cargo (development) builds 2025-02-13 17:27:28 -05:00
Peter Zhu
16f41eca53 Remove dead iv_index_tbl field in RObject 2025-02-12 14:03:07 -05:00
dependabot[bot]
afb47a1f10 Bump capstone from 0.12.0 to 0.13.0 in /yjit
Bumps [capstone](https://github.com/capstone-rust/capstone-rs) from 0.12.0 to 0.13.0.
- [Release notes](https://github.com/capstone-rust/capstone-rs/releases)
- [Changelog](https://github.com/capstone-rust/capstone-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/capstone-rust/capstone-rs/compare/capstone-v0.12.0...capstone-v0.13.0)

---
updated-dependencies:
- dependency-name: capstone
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-05 11:37:34 +09:00
Alan Wu
9497820bcf YJIT: Remove comments that refer to the removed "stats" feature
The Cargo feature was removed in 2de8b5b805
and it's available in all build configs now.

[ci skip]
2025-01-30 18:00:53 -05:00
Alan Wu
95bf359087 YJIT: Turn on dead code lint for the stats module 2025-01-30 18:00:53 -05:00
Alan Wu
7e733ca551 YJIT: Explicitly specify C ABI to fix a nightly Rust warning 2025-01-30 18:00:53 -05:00
Alan Wu
5a7089fc03 YJIT: A64: Remove assert that trips when OOM at page boundary
With a well-timed OOM around a page switch in the backend, it can return
RetryOnNextPage twice and crash due to the assert. (More places can
signal OOM now since VirtualMem tracks Rust malloc heap size for
--yjit-mem-size.)

Return error in these cases instead of crashing.

Fixes: https://github.com/Shopify/ruby/issues/566
2025-01-29 19:09:39 -05:00
Alan Wu
58ccce60cf
YJIT: Initialize locals in ISeqs defined with ... (#12660)
* YJIT: Fix indentation [ci skip]

Fixes: cdf33ed5f3

* YJIT: Initialize locals in ISeqs defined with `...`

Previously, callers of forwardable ISeqs moved the stack pointer up
without writing to the stack. If there happens to be a stale value in
the area skipped over, it could crash due to "try to mark T_NONE". Also,
the uninitialized local variables were observable through `binding`.

Initialize the locals to nil.

[Bug #21021]
2025-01-28 23:54:38 -05:00
Alan Wu
4d8eaa9e45 YJIT: Rename send_iseq_forwarding->send_forwarding
It's in gen_send_general(), so nothing specifically to do with iseqs.
2025-01-10 18:03:31 -05:00
Aaron Patterson
50c2c4bdde Make rb_vm_insns_count a thread local variable
`rb_vm_insns_count` is a global variable used for reporting YJIT
statistics. It is a counter that tallies the number of interpreter
instructions that have been executed, this way we can approximate how
much time we're spending in YJIT compared to the interpreter.

Unfortunately keeping this statistic means that every instruction
executed in the interpreter loop must increment the counter. Normally
this isn't a problem, but in multi-threaded situations (when Ractors are
used), incrementing this counter can become quite costly due to page
caching issues.

Additionally, since there is no locking when incrementing this global,
the count can't really make sense in a multi-threaded environment.

This commit changes `rb_vm_insns_count` to a thread local. That way each
Ractor has it's own copy of the counter and incrementing the counter
becomes quite cheap. Of course this means that in multi-threaded
situations, the value doesn't really make sense (but it didn't make
sense before because of the lack of locking).

The counter is used for YJIT statistics, and since YJIT is basically
disabled when Ractors are in use, I don't think we care about
inaccuracies (for the time being). We can revisit this counter when we
give YJIT multi-threading support, but for the time being this commit
restores multi-threaded performance.

To test this, I used the benchmark in [Bug #20489].

Here is the performance on Ruby 3.2:

```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

________________________________________________________
Executed in    2.53 secs    fish           external
   usr time   19.86 secs  370.00 micros   19.86 secs
   sys time    0.02 secs  320.00 micros    0.02 secs
```

We can see the regression in performance on the master branch:

```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.5.0dev (2025-01-10T16:22:26Z master 4a2702dafb) +PRISM [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

________________________________________________________
Executed in   24.87 secs    fish           external
   usr time  195.55 secs    0.00 micros  195.55 secs
   sys time    0.00 secs  716.00 micros    0.00 secs
```

Here are the stats after this commit:

```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.5.0dev (2025-01-10T20:37:06Z tl 3ef0432779) +PRISM [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

________________________________________________________
Executed in    2.46 secs    fish           external
   usr time   19.34 secs  381.00 micros   19.34 secs
   sys time    0.01 secs  321.00 micros    0.01 secs
```

[Bug #20489]
2025-01-10 13:39:21 -08:00
Alan Wu
dd80d9b089 YJIT: Filter & calls from specialized C method codegen
Evident with the crash reported in [Bug #20997], the C replacement
codegen functions aren't authored to handle block arguments (nor
should they because the extra code from the complexity defeats
optimization). Filter sites with VM_CALL_ARGS_BLOCKARG.
2025-01-08 19:47:39 -05:00
Alan Wu
c71addc522 YJIT: Fix crash when yielding keyword arguments
Previously, the code for dropping surplus arguments when yielding
into blocks erroneously attempted to drop keyword arguments when there
is in fact no surplus arguments. Fix the condition and test that
supplying the exact number of keyword arguments as require compiles
without fallback.
2025-01-04 12:53:20 -05:00
Takashi Kokubun
527cc73282
YJIT: Return None if entry block compilation fails (#12445) 2024-12-23 22:12:08 +00:00
Takashi Kokubun
6bf7a1765f
YJIT: Load registers on JIT entry to reuse blocks (#12355) 2024-12-17 12:32:42 -05:00
Alan Wu
f3a117605c
YJIT: Speculate block arg for c_func_method(&nil) calls (#12326)
A good amount of call sites always pass nil as block argument, but the
nil doesn't show up in the context. Put a runtime guard for those
cases to handle it. Particular relevant for the `ruby-lsp` benchmark in
`yjit-bench`. Up to a 2% speedup across headline benchmarks.

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Co-authored-by: Kevin Menard <kevin@nirvdrum.com>
Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
2024-12-13 10:41:04 -05:00
Alan Wu
e7ee7d43f3 YJIT: Allow then-unknown static_mut_refs on older Rusts [ci skip] 2024-12-12 18:52:51 -05:00
Alan Wu
d53e4545f4 YJIT: Fix unread field lint in release builds
```
warning: fields `blue_begin` and `blue_end` are never read
```
2024-12-11 17:44:43 -05:00
Alan Wu
9fe06cc035 YJIT: Disable static_mut_refs for now 2024-12-11 17:44:43 -05:00
Alan Wu
6cb75564f9 YJIT: Use the correct size constant 2024-12-11 17:44:43 -05:00
Takashi Kokubun
14e0a40cd0 YJIT: Add a comment about a lazy frame call
jit_prepare_lazy_frame_call is a complicated trick and comes with memory
overhead. Every use of the function should come with justification.
2024-12-09 10:09:40 -08:00
Takashi Kokubun
cff031253f
YJIT: Spill/load argument registers to reuse blocks (#12287)
* YJIT: Spill/load argument registers to reuse blocks

* Mention the immediate function name

* Explain the context behind spill/load operations
2024-12-09 10:02:40 -08:00
Max Bernstein
8010d79bb4
YJIT: Only enable disassembly colors for tty (#12283)
* YJIT: Use fully-qualified name for OPTIONS in get_options!

* YJIT: Only enable disassembly colors for tty
2024-12-09 10:36:17 -05:00
Maximillian Polhill
1c4dbb133e
YJIT: Generate specialized code for Symbol for objtostring (#12247)
* YJIT: Generate specialized code for Symbol for objtostring

Co-authored-by: John Hawthorn <john@hawthorn.email>

* Update yjit/src/codegen.rs

---------

Co-authored-by: John Hawthorn <john@hawthorn.email>
Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2024-12-04 21:34:16 +00:00
Maxime Chevalier-Boisvert
4b4d52ef50
YJIT: track time since initialization (#12263) 2024-12-04 21:24:36 +00:00
Alan Wu
2fc357c16d YJIT: Avoid std::ffi::CString with rb_intern2() during boot
Fewer allocations on boot, too.

Suggested-by: https://github.com/ruby/ruby/pull/12217
2024-11-29 16:45:22 -05:00
John Hawthorn
a5119a3f27 YJIT: Add missing prepare before calling str_dup 2024-11-28 15:04:12 -05:00
Randy Stauner
8f9b9aecd0
YJIT: Implement opt_reverse insn (#12175) 2024-11-26 16:49:24 -05:00
Randy Stauner
1dd40ec18a
Optimize instructions when creating an array just to call include? (#12123)
* Add opt_duparray_send insn to skip the allocation on `#include?`

If the method isn't going to modify the array we don't need to copy it.
This avoids the allocation / array copy for things like `[:a, :b].include?(x)`.

This adds a BOP for include? and tracks redefinition for it on Array.

Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>

* YJIT: Implement opt_duparray_send include_p

Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>

* Update opt_newarray_send to support simple forms of include?(arg)

Similar to opt_duparray_send but for non-static arrays.

* YJIT: Implement opt_newarray_send include_p

---------

Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
2024-11-26 14:31:08 -05:00
Maxime Chevalier-Boisvert
081bdc5125
YJIT: fix small typo in command line options help (#12167) 2024-11-25 19:32:19 +00:00
Alan Wu
bf718cef59
YJIT: Make compilation_failure a default stat (#12128)
It's good to monitor compilation failures.
2024-11-20 17:13:31 -05:00
Alan Wu
350b544468 YJIT: Refactor to forward jump_to_next_insn() return value
It's more concise this way and since `return Some(EndBlock)` is the only
correct answer, no point repeating it everywhere.
2024-11-20 10:06:14 -05:00
Alan Wu
199877d258 YJIT: Abandon block when gen_outlined_exit() fails
When CodeBlock::set_page fails (part of next_page(), see their docs for
exact conditions), it can cause gen_outlined_exit() to fail while there
is still plenty of memory available. Previously, this can have YJIT
running incomplete code due to taking the early return in
end_block_with_jump() that manifested as crashes with SIGILL.

Add and use a wrapper with error handling.
2024-11-20 10:06:14 -05:00