Commit graph

131 commits

Author SHA1 Message Date
Alan Wu
92b218fbc3 YJIT: ZJIT: Allow both JITs in the same build
This commit allows building YJIT and ZJIT simultaneously, a "combo
build". Previously, `./configure --enable-yjit --enable-zjit` failed. At
runtime, though, only one of the two can be enabled at a time.

Add a root Cargo workspace that contains both the yjit and zjit crate.
The common Rust build integration mechanisms are factored out into
defs/jit.mk.

Combo YJIT+ZJIT dev builds are supported; if either JIT uses
`--enable-*=dev`, both of them are built in dev mode.

The combo build requires Cargo, but building one JIT at a time with only
rustc in release build remains supported.
2025-05-15 00:39:03 +09:00
Jean Boussier
ea77250847 Rename RB_OBJ_SHAPE -> rb_obj_shape
As well as `RB_OBJ_SHAPE_ID` -> `rb_obj_shape_id`
and `RSHAPE` is now a simple alias for `rb_shape_lookup`.

I tried to turn all these into `static inline` but I'm having
trouble with `RUBY_EXTERN rb_shape_tree_t *rb_shape_tree_ptr;`
not being exposed as I'd expect.
2025-05-09 10:22:51 +02:00
Jean Boussier
5782561fc1 Rename rb_shape_get_shape_id -> RB_OBJ_SHAPE_ID
And `rb_shape_get_shape` -> `RB_OBJ_SHAPE`.
2025-05-09 10:22:51 +02:00
Jean Boussier
c9b08882b7 Refactor rb_shape_get_next to return an ID
Also rename it, and change parameters to be consistent with
other transition functions.
2025-05-09 10:22:51 +02:00
Jean Boussier
3f7c0af051 Rename rb_shape_obj_too_complex -> rb_shape_obj_too_complex_p 2025-05-09 10:22:51 +02:00
Jean Boussier
334ebba221 Rename rb_shape_get_shape_by_id -> RSHAPE 2025-05-09 10:22:51 +02:00
Jean Boussier
6c9b3ac232 Refactor OBJ_TOO_COMPLEX_SHAPE_ID to not be referenced outside shape.h
Also refactor checks for `->type == SHAPE_OBJ_TOO_COMPLEX`.
2025-05-08 07:58:05 +02:00
Alan Wu
33909a1c69 YJIT: ZJIT: Share identical glue functions
Working towards having YJIT and ZJIT in the same build, we need to
deduplicate some glue code that would otherwise cause name collision.
Add jit.c for this and build it for YJIT and ZJIT builds. Update bindgen
to look at jit.c; some shuffling of functions in the output, but the set
of functions shouldn't have changed.
2025-05-02 23:47:57 +09:00
Aaron Patterson
8ac8225c50 Inline Class#new.
This commit inlines instructions for Class#new.  To make this work, we
added a new YARV instructions, `opt_new`.  `opt_new` checks whether or
not the `new` method is the default allocator method.  If it is, it
allocates the object, and pushes the instance on the stack.  If not, the
instruction jumps to the "slow path" method call instructions.

Old instructions:

```
> ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path                   <ic:0 Object>             (   1)[Li]
0002 opt_send_without_block                 <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```

New instructions:

```
> ./miniruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path                   <ic:0 Object>             (   1)[Li]
0002 putnil
0003 swap
0004 opt_new                                <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block                 <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump                                   14
0011 opt_send_without_block                 <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```

This commit speeds up basic object allocation (`Foo.new`) by 60%, but
classes that take keyword parameters see an even bigger benefit because
no hash is allocated when instantiating the object (3x to 6x faster).

Here is an example that uses `Hash.new(capacity: 0)`:

```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
  Time (mean ± σ):      1.082 s ±  0.004 s    [User: 1.074 s, System: 0.008 s]
  Range (min … max):    1.076 s …  1.088 s    10 runs

Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
  Time (mean ± σ):     627.9 ms ±   3.5 ms    [User: 622.7 ms, System: 4.8 ms]
  Range (min … max):   622.7 ms … 633.2 ms    10 runs

Summary
  ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
    1.72 ± 0.01 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```

This commit changes the backtrace for `initialize`:

```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
class Foo
  def initialize
    puts caller
  end
end

def hello
  Foo.new
end

hello
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:8:in 'Class#new'
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
aaron@tc ~/g/ruby (inline-new)> ./miniruby -v test.rb
ruby 3.5.0dev (2025-03-28T23:59:40Z inline-new c4157884e4) +PRISM [arm64-darwin24]
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
```

It also increases memory usage for calls to `new` by 122 bytes:

```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
require "objspace"

class Foo
  def initialize
    puts caller
  end
end

def hello
  Foo.new
end

puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:hello)))
aaron@tc ~/g/ruby (inline-new)> make runruby
RUBY_ON_BUG='gdb -x ./.gdbinit -p' ./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems  ./test.rb
656
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
```

Thanks to @ko1 for coming up with this idea!

Co-Authored-By: John Hawthorn <john@hawthorn.email>
2025-04-25 13:46:05 -07:00
Takashi Kokubun
ae3d6a321b Fix yjit-bindgen 2025-04-18 21:53:01 +09:00
Takashi Kokubun
ae17323a65 Move a couple of bindgen targets to ZJIT bindgen
We filed https://github.com/Shopify/zjit/pull/65 and
https://github.com/Shopify/zjit/pull/64 concurrently.
2025-04-18 21:53:00 +09:00
Alan Wu
19e8e45f69 Rust tests: Load builtins (core library written in ruby)
Key here is calling rb_call_builtin_inits(), which sticking to public
API for robustness is done by calling ruby_options().

Fixes: https://github.com/Shopify/zjit/issues/61
2025-04-18 21:53:00 +09:00
Max Bernstein
97f022b5e7 Print Ruby exception in test utils 2025-04-18 21:53:00 +09:00
Max Bernstein
ec41dffd05 Add compact Type lattice
This will be used for local type inference and potentially SCCP.
2025-04-18 21:52:59 +09:00
Takashi Kokubun
0a543daf15 Add zjit_* instructions to profile the interpreter (https://github.com/Shopify/zjit/pull/16)
* Add zjit_* instructions to profile the interpreter

* Rename FixnumPlus to FixnumAdd

* Update a comment about Invalidate

* Rename Guard to GuardType

* Rename Invalidate to PatchPoint

* Drop unneeded debug!()

* Plan on profiling the types

* Use the output of GuardType as type refined outputs
2025-04-18 21:52:59 +09:00
Alan Wu
e24be0b8d5 Upgrade bindgen, so it generates unsafe extern as 2024 expects 2025-04-18 21:52:59 +09:00
Alan Wu
4326b0cece boot_vm boots and runs 2025-04-18 21:52:57 +09:00
Alan Wu
14a4edaea6 bindgen works in --enable-zjit=dev mode. 2025-04-18 21:52:56 +09:00
Alan Wu
106b328117 make zjit-bindgen runs, but doesn't graft the right things yet 2025-04-18 21:52:56 +09:00
Aaron Patterson
6b3a97d74b Remove undefined function from bindgen
`rb_get_iseq_body_total_calls` was removed in cd8d20cd1f, but it's still in the YJIT bindgen file.  This commit just removes it from bindgen
2025-02-16 16:37:36 -05:00
Aaron Patterson
8cafa5b8ce Only count VM instructions in YJIT stats builds
The instruction counter is slowing multi-Ractor applications.  I had
changed it to use a thread local, but using a thread local is slowing
single threaded applications.  This commit only enables the instruction
counter in YJIT stats builds until we can figure out a way to gather the
information with lower overhead.

Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
2025-02-14 14:39:35 -05:00
Aaron Patterson
50c2c4bdde Make rb_vm_insns_count a thread local variable
`rb_vm_insns_count` is a global variable used for reporting YJIT
statistics. It is a counter that tallies the number of interpreter
instructions that have been executed, this way we can approximate how
much time we're spending in YJIT compared to the interpreter.

Unfortunately keeping this statistic means that every instruction
executed in the interpreter loop must increment the counter. Normally
this isn't a problem, but in multi-threaded situations (when Ractors are
used), incrementing this counter can become quite costly due to page
caching issues.

Additionally, since there is no locking when incrementing this global,
the count can't really make sense in a multi-threaded environment.

This commit changes `rb_vm_insns_count` to a thread local. That way each
Ractor has it's own copy of the counter and incrementing the counter
becomes quite cheap. Of course this means that in multi-threaded
situations, the value doesn't really make sense (but it didn't make
sense before because of the lack of locking).

The counter is used for YJIT statistics, and since YJIT is basically
disabled when Ractors are in use, I don't think we care about
inaccuracies (for the time being). We can revisit this counter when we
give YJIT multi-threading support, but for the time being this commit
restores multi-threaded performance.

To test this, I used the benchmark in [Bug #20489].

Here is the performance on Ruby 3.2:

```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

________________________________________________________
Executed in    2.53 secs    fish           external
   usr time   19.86 secs  370.00 micros   19.86 secs
   sys time    0.02 secs  320.00 micros    0.02 secs
```

We can see the regression in performance on the master branch:

```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.5.0dev (2025-01-10T16:22:26Z master 4a2702dafb) +PRISM [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

________________________________________________________
Executed in   24.87 secs    fish           external
   usr time  195.55 secs    0.00 micros  195.55 secs
   sys time    0.00 secs  716.00 micros    0.00 secs
```

Here are the stats after this commit:

```
$ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
ruby 3.5.0dev (2025-01-10T20:37:06Z tl 3ef0432779) +PRISM [x86_64-linux]
[0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

________________________________________________________
Executed in    2.46 secs    fish           external
   usr time   19.34 secs  381.00 micros   19.34 secs
   sys time    0.01 secs  321.00 micros    0.01 secs
```

[Bug #20489]
2025-01-10 13:39:21 -08:00
Randy Stauner
beafae9750
YJIT: Specialize String#[] (String#slice) with fixnum arguments (#12069)
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments

String#[] is in the top few C calls of several YJIT benchmarks:
liquid-compile rubocop mail sudoku

This speeds up these benchmarks by 1-2%.

* YJIT: Try harder to get type info for `String#[]`

In the large generated code of the mail gem the context doesn't have
the type info.  In that case if we peek at the stack and add a guard
we can still apply the specialization
and it speeds up the mail benchmark by 5%.

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>

---------

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
2024-11-13 12:25:09 -05:00
Alan Wu
b41c65b577 YJIT: Implement specialization for no-op {Kernel,Numeric}#dup
Type information in the context for no additional work!

This is the `if (special_object_p(obj)) return obj;` path in
rb_obj_dup() and for Numeric#dup, it's always the identity function.
2024-10-22 11:30:35 -04:00
John Hawthorn
7be9a333ca
YJIT: Allow shareable consts in multi-ractor mode (#11917)
* Update yjit-bindgen deps

* YJIT: Allow shareable consts in multi-ractor mode

* Update yjit/src/codegen.rs

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

---------

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-10-18 15:01:45 -04:00
Alan Wu
ded078c2c4
YJIT: Fastpath for Module#name (#11819)
Module#name shows up as a top C method callee in lobsters so probably
common enough. It's also easy to substitute thanks to rb_mod_name()
already having no GC yield points.

    klass = BasicObject
    50_000_000.times { klass.name }

    Benchmark 1: /.rubies/post/bin/ruby --yjit mod_name.rb
      Time (mean ± σ):      1.433 s ±  0.010 s    [User: 1.410 s, System: 0.010 s]
      Range (min … max):    1.421 s …  1.449 s    10 runs

    Benchmark 2: /.rubies/mstr/bin/ruby --yjit mod_name.rb
      Time (mean ± σ):      1.491 s ±  0.012 s    [User: 1.468 s, System: 0.010 s]
      Range (min … max):    1.470 s …  1.511 s    10 runs

    Summary
      /.rubies/post/bin/ruby --yjit mod_name.rb ran
        1.04 ± 0.01 times faster than /.rubies/mstr/bin/ruby --yjit mod_name.rb
2024-10-08 11:44:59 -04:00
Randy Stauner
942317ebf8
YJIT: Encode doubles to VALUE objects and move stat generation to rust (#11388)
* YJIT: Encode doubles to VALUE objects and move stat generation to rust

Stats that can now be generated from rust have been moved there.

* Move object_shape_count call for runtime_stats to rust

This reduces the ruby method to a single primitive.

* Change hash_aset_usize from macro to function
2024-08-27 22:24:17 -04:00
Kevin Menard
04a6165ac0
YJIT: Enhance the String#<< method substitution to handle integer codepoint values. (#11032)
* Document why we need to explicitly spill registers.

* Simplify passing a byte value to `str_buf_cat`.

* YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values.

* YJIT: Move runtime type check into YJIT.

Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum.
2024-08-02 15:45:22 -04:00
Randy Stauner
acbb8d4fb5 Expand opt_newarray_send to support Array#pack with buffer keyword arg
Use an enum for the method arg instead of needing to add an id
that doesn't map to an actual method name.

$ ruby --dump=insns -e 'b = "x"; [v].pack("E*", buffer: b)'

before:

```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring                       "x"                       (   1)[Li]
0002 setlocal_WC_0                          b@0
0004 putself
0005 opt_send_without_block                 <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0007 newarray                               1
0009 putchilledstring                       "E*"
0011 getlocal_WC_0                          b@0
0013 opt_send_without_block                 <calldata!mid:pack, argc:2, kw:[#<Symbol:0x000000000023110c>], KWARG>
0015 leave
```

after:

```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring                       "x"                       (   1)[Li]
0002 setlocal_WC_0                          b@0
0004 putself
0005 opt_send_without_block                 <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0007 putchilledstring                       "E*"
0009 getlocal                               b@0, 0
0012 opt_newarray_send                      3, 5
0015 leave
```
2024-07-29 16:26:58 -04:00
Aaron Patterson
cdf33ed5f3 Optimized forwarding callers and callees
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.

Calls it optimizes look like this:

```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```

```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```

```ruby
def bar(*a) = a

def foo(...)
  list = [1, 2]
  bar(*list, ...) # optimized
end
foo(123)
```

All variants of the above but using `super` are also optimized, including a bare super like this:

```ruby
def foo(...)
  super
end
```

This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:

```ruby
def m
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end

def bar(a) = a
def foo(...) = bar(...)

def test
  m { foo(123) }
end

test
p test # allocates 1 object on master, but 0 objects with this patch
```

```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)

def test
  m { foo(1, b: 2) }
end

test
p test # allocates 2 objects on master, but 0 objects with this patch
```

How does it work?
-----------------

This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.

I think this description is kind of confusing, so let's walk through an example with code.

```ruby
def delegatee(a, b) = a + b

def delegator(...)
  delegatee(...)  # CI2 (FORWARDING)
end

def caller
  delegator(1, 2) # CI1 (argc: 2)
end
```

Before we call the delegator method, the stack looks like this:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   |
              5|   delegatee(...)  # CI2 (FORWARDING)  |
              6| end                                   |
              7|                                       |
              8| def caller                            |
          ->  9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method.  The `delegator` method has a special local called `...`
that holds the caller's CI object.

Here is the ISeq disasm fo `delegator`:

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

The local called `...` will contain the caller's CI: CI1.

Here is the stack when we enter `delegator`:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
           -> 4|   #                                   | CI1 (argc: 2)
              5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            |
              9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`.  In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`.  It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).

Before executing the `send` instruction, we push `...` on the stack.  The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

Instruction 001 puts the caller's CI on the stack.  `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   | CI1 (argc: 2)
           -> 5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            | self
              9|   delegator(1, 2) # CI1 (argc: 2)     | 1
             10| end                                   | 2
```

The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.

Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.

I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.

I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.

For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:

```ruby
SomeObject.new(foo: 1)
```

If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-18 09:28:25 -07:00
Jean Boussier
f7b53a75b6 Do not emit shape transition warnings when YJIT is compiling
[Bug #20522]

If `Warning.warn` is redefined in Ruby, emitting a warning would invoke
Ruby code, which can't safely be done when YJIT is compiling.
2024-06-04 19:21:01 +02:00
Étienne Barrié
1376881e9a Stop marking chilled strings as frozen
They were initially made frozen to avoid false positives for cases such
as:

    str = str.dup if str.frozen?

But this may cause bugs and is generally confusing for users.

[Feature #20205]

Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-05-28 07:32:33 +02:00
Randy Stauner
845f2db136
YJIT: Add specialized codegen function for TrueClass#=== (#10640)
* YJIT: Add specialized codegen function for `TrueClass#===`

TrueClass#=== is currently number 10 in the most frequent C calls list of the lobsters benchmark.

```
require "benchmark/ips"

def wrap
  true === true
  true === false
  true === :x
end

Benchmark.ips do |x|
  x.report(:wrap) do
    wrap
  end
end
```

```
before
Warming up --------------------------------------
                wrap     1.791M i/100ms
Calculating -------------------------------------
                wrap     17.806M (± 1.0%) i/s -     89.544M in   5.029363s

after
Warming up --------------------------------------
                wrap     4.024M i/100ms
Calculating -------------------------------------
                wrap     40.149M (± 1.1%) i/s -    201.223M in   5.012527s
```

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Co-authored-by: Kevin Menard <kevin.menard@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* Fix the new test for RJIT

---------

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Co-authored-by: Kevin Menard <kevin.menard@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-04-29 14:32:07 -04:00
Takashi Kokubun
7ab1a608e7
YJIT: Optimize local variables when EP == BP (take 2) (#10607)
* Revert "Revert "YJIT: Optimize local variables when EP == BP" (#10584)"

This reverts commit c878344195.

* YJIT: Take care of GC references in ISEQ invariants

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

---------

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2024-04-25 10:04:53 -04:00
Kevin Menard
afc7799c32
YJIT: Add a specialized codegen function for Class#superclass. (#10613)
Add a specialized codegen function for `Class#superclass`.

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Co-authored-by: Randy Stauner <randy.stauner@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-04-24 10:31:35 -04:00
Alan Wu
c878344195
Revert "YJIT: Optimize local variables when EP == BP" (#10584)
This reverts commit 4cc58ea0b8.

Since the change landed call-threshold=1 CI runs have been timing out.
There has also been `verify-ctx` violations. Revert for now while we debug.
2024-04-19 16:47:25 +00:00
Takashi Kokubun
4cc58ea0b8
YJIT: Optimize local variables when EP == BP (#10487) 2024-04-17 15:00:03 -04:00
Alan Wu
11f121364a
YJIT: Support splat with C methods with -1 arity
Usually we deal with splats by speculating that they're of a specific
size. In this case, the C method takes a pointer and a length, so
we can support changing sizes just fine.
2024-02-27 17:50:38 +00:00
Alan Wu
da7b9478d3
YJIT: Pass nil to anonymous kwrest when empty (#9972)
This is the same optimization as e4272fd29 ("Avoid allocation when
passing no keywords to anonymous kwrest methods") but for YJIT. For
anonymous kwrest parameters, nil is just as good as an empty hash.

On the usage side, update `splatkw` to handle `nil` with a leaner path.
2024-02-15 11:59:37 -05:00
Peter Zhu
1d3b306753 Move rb_class_allocate_instance from gc.c to object.c 2024-02-14 13:43:02 -05:00
Aaron Patterson
c35fea8509
Specialize String#byteslice(a, b) (#9939)
* Specialize String#byteslice(a, b)

This adds a specialization for String#byteslice when there are two
parameters.

This makes our protobuf parser go from 5.84x slower to 5.33x slower

```
Comparison:
decode upstream (53738 bytes):     7228.5 i/s
decode protobuff (53738 bytes):     1236.8 i/s - 5.84x  slower

Comparison:
decode upstream (53738 bytes):     7024.8 i/s
decode protobuff (53738 bytes):     1318.5 i/s - 5.33x  slower
```

* Update yjit/src/codegen.rs

---------

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2024-02-13 16:20:27 +00:00
Alan Wu
2131d04f43 YJIT: Add support for **kwrest parameters
Now that `...` uses `**kwrest` instead of regular splat and
ruby2keywords, we need to support these type of methods to
support `...` well.
2024-02-12 13:57:24 -05:00
Takashi Kokubun
e7b0a01002
YJIT: Add top ISEQ call counts to --yjit-stats (#9906) 2024-02-09 22:12:24 +00:00
Takashi Kokubun
09427f51a2
YJIT: Add codegen for Float arithmetics (#9774)
* YJIT: Add codegen for Float arithmetics

* Add Flonum and Fixnum tests
2024-01-31 17:58:47 +00:00
Takashi Kokubun
7567e4e1e1
YJIT: Fix exits on splatkw instruction (#9711) 2024-01-26 00:22:27 +00:00
Takashi Kokubun
2034e6ad5a
YJIT: Support concattoarray and pushtoarray (#9708) 2024-01-25 21:45:58 +00:00
Alan Wu
2cc7a56ec7
YJIT: Avoid leaks by skipping objects with a singleton class
For receiver with a singleton class, there are multiple vectors YJIT can
end up retaining the object. There is a path in jit_guard_known_klass()
that bakes the receiver into the code, and the object could also be kept
alive indirectly through a path starting at the CME object baked into
the code.

To avoid these leaks, avoid compiling calls on objects with a singleton
class.

See: https://github.com/Shopify/ruby/issues/552

[Bug #20209]
2024-01-24 18:06:58 -05:00
Alan Wu
ac1e9e443a YJIT: Fix ruby2_keywords splat+rest and drop bogus checks
YJIT didn't guard for ruby2_keywords hash in case of splat calls that
land in methods with a rest parameter, creating incorrect results.

The compile-time checks didn't correspond to any actual effects of
ruby2_keywords, so it was masking this bug and YJIT was needlessly
refusing to compile some code. About 16% of fallback reasons in
`lobsters` was due to the ISeq check.

We already handle the tagging part with
exit_if_supplying_kw_and_has_no_kw() and should now have a dynamic guard
for all splat cases.

Note for backporting: You also need 7f51959ff1.

[Bug #20195]
2024-01-23 19:22:57 -05:00
dependabot[bot]
bd1895990c
Bump shlex from 1.1.0 to 1.3.0 in /yjit/bindgen (#9652)
Bumps [shlex](https://github.com/comex/rust-shlex) from 1.1.0 to 1.3.0.
- [Changelog](https://github.com/comex/rust-shlex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/comex/rust-shlex/commits)

---
updated-dependencies:
- dependency-name: shlex
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-22 14:11:07 -08:00
Takashi Kokubun
3c9290173a
YJIT: Optimize defined?(yield) (#9599)
* YJIT: Optimize defined?(yield)

* Remove an irrelevant comment

* s/get/gen/
2024-01-19 11:00:46 -05:00