Commit graph

60 commits

Author SHA1 Message Date
Takashi Kokubun
9fd94d6a0c
YJIT: Support entry for multiple PCs per ISEQ (GH-7535) 2023-03-17 11:53:17 -07:00
Alan Wu
de174681f7 YJIT: Assert that we have the VM lock while marking
Somewhat important because having the lock is a key part of the
soundness reasoning for the `unsafe` usage here.
2023-03-15 15:45:20 -04:00
Takashi Kokubun
70ba310212
YJIT: Introduce no_gc attribute (#7511) 2023-03-14 15:38:58 -07:00
Jimmy Miller
45127c84d9
YJIT: Handle rest+splat where non-splat < required (#7499) 2023-03-13 11:12:23 -04:00
Ole Friis Østergaard
4667a3a665 Add defined_ivar as YJIT instruction as well
This works much like the existing `defined` implementation,
but calls out to rb_ivar_defined instead of the more general
rb_vm_defined.

Other difference to the existing `defined` implementation is
that this new instruction has to take the same operands as
the CRuby `defined_ivar` instruction.
2023-03-08 09:34:31 -08:00
Jimmy Miller
56df6d5f9d
YJIT: Handle splat+rest for args pass greater than required (#7468)
For example:

```ruby
def my_func(x, y, *rest)
    p [x, y, rest]
end

my_func(1, 2, 3, *[4, 5])
```
2023-03-07 17:03:43 -05:00
Jimmy Miller
719a7726d1
YJIT: Handle special case of splat and rest lining up (#7422)
If you have a method that takes rest arguments and a splat call that
happens to line up perfectly with that rest, you can just dupe the
array rather than move anything around. We still have to dupe, because
people could have a custom to_a method or something like that which
means it is hard to guarantee we have exclusive access to that array.

Example:

```ruby
def foo(a, b, *rest)
end

foo(1, 2, *[3, 4])
```
2023-03-07 12:29:59 -05:00
Matt Valentine-House
ae5e62ee90 Merge internal/intern/gc.h into internal/gc.h 2023-02-27 10:11:56 -08:00
Jean Boussier
1a4b4cd7f8 Move attached_object into rb_classext_struct
Given that signleton classes don't have an allocator,
we can re-use these bytes to store the attached object
in `rb_classext_struct` without making it larger.
2023-02-16 08:14:44 +01:00
Jimmy Miller
8943b0d411
YJIT: Kernel#{is_a?,instance_of?} fast paths (GH-7297)
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2023-02-15 14:05:42 -05:00
Takashi Kokubun
15ef2b2d7c
YJIT: Optimize != for Integers and Strings (#7301) 2023-02-14 16:31:33 -05:00
Maxime Chevalier-Boisvert
73674cac2b
YJIT: log the names of methods we call to in disasm (#7231)
* YJIT: log the names of methods we call to in disasm

* Assert that pointer is not null

* Handle case where UTF8 conversion not possible
2023-02-02 16:54:16 -05:00
Alan Wu
3b83b265f1 YJIT: Crash with rb_bug() when panicking
Helps with getting good bug reports in the wild. Intended to be
backported to the 3.2.x series.
2023-02-02 15:16:09 -05:00
Jimmy Miller
1148fab7ae
YJIT: Handle splat with opt more fully (#7209)
* YJIT: Handle splat with opt more fully

* Update yjit/src/codegen.rs

---------

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2023-01-31 16:18:56 -05:00
Jimmy Miller
762a3d80f7
Implement splat for cfuncs. Split exit exit cases to better capture where we are exiting (#6929)
YJIT: Implement splat for cfuncs. Split exit cases

This also implements a new check for ruby2keywords as the last
argument of a splat. This does mean that we generate more code, but in
actual benchmarks where we gained speed from this (binarytrees) I
don't see any significant slow down. I did have to struggle here with
the register allocator to find code that didn't allocate too many
registers. It's a bit hard when everything is implicit. But I think I
got to the minimal amount of copying and stuff given our current
allocation strategy.
2023-01-19 13:42:49 -05:00
Maxime Chevalier-Boisvert
6bb576fe75
YJIT: implement codegen for String#empty? (#7148)
YJIT: implement codegen for String#empty?
2023-01-18 15:41:28 -05:00
Takashi Kokubun
00d58afb5d
YJIT: Make iseq_get_location consistent with iseq.c (#7074)
* YJIT: Make iseq_get_location consistent with iseq.c

* YJIT: Call it "YJIT entry point"

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2023-01-06 11:49:59 -08:00
Jemma Issroff
c1ab6ddc9a Transition complex objects to "too complex" shape
When an object becomes "too complex" (in other words it has too many
variations in the shape tree), we transition it to use a "too complex"
shape and use a hash for storing instance variables.

Without this patch, there were rare cases where shape tree growth could
"explode" and cause performance degradation on what would otherwise have
been cached fast paths.

This patch puts a limit on shape tree growth, and gracefully degrades in
the rare case where there could be a factorial growth in the shape tree.

For example:

```ruby
class NG; end

HUGE_NUMBER.times do
  NG.new.instance_variable_set(:"@unique_ivar_#{_1}", 1)
end
```

We consider objects to be "too complex" when the object's class has more
than SHAPE_MAX_VARIATIONS (currently 8) leaf nodes in the shape tree and
the object introduces a new variation (a new leaf node) associated with
that class.

For example, new variations on instances of the following class would be
considered "too complex" because those instances create more than 8
leaves in the shape tree:

```ruby
class Foo; end
9.times { Foo.new.instance_variable_set(":@uniq_#{_1}", 1) }
```

However, the following class is *not* too complex because it only has
one leaf in the shape tree:

```ruby
class Foo
  def initialize
    @a = @b = @c = @d = @e = @f = @g = @h = @i = nil
  end
end
9.times { Foo.new }
``

This case is rare, so we don't expect this change to impact performance
of most applications, but it needs to be handled.

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
2022-12-15 10:06:04 -08:00
Alan Wu
e714907d82 YJIT: Upgrade bindgen to stabilize and reduce output
The new version has an option to merge everything into a big
`extern "C"` block and it's nicer.

More importantly, this upgrade fixes an issue where Ubuntu with Clang 12
and macOS with Clang 14 gave a one line diff for `rb_shape_t`. It was
slightly annoying because we use macOS locally.
2022-12-08 17:35:18 -05:00
Jemma Issroff
e7642d8095
YJIT: Extract SHAPE_ID_NUM_BITS into a constant (#6863) 2022-12-05 13:20:11 -08:00
Jemma Issroff
41bacd9b0d Remove unused rb_shape_flag_shift and rb_shape_flag_mask 2022-12-02 12:53:51 -08:00
Jemma Issroff
4c5e89791b Extracted rb_shape_id_offset 2022-12-02 12:53:51 -08:00
Aaron Patterson
744b0527ea bail on compilation if the comptime receiver is frozen 2022-12-02 12:53:51 -08:00
Aaron Patterson
17f9bcd7d7 implement IV writes 2022-12-02 12:53:51 -08:00
Alan Wu
eb2b717a8b
YJIT: Make case-when optimization respect === redefinition (#6846)
* YJIT: Make case-when optimization respect === redefinition

Even when a fixnum key is in the dispatch hash, if there is a case such
that its basic operations for === is redefined, we need to fall back to
checking each case like the interpreter. Semantically we're always
checking each case by calling === in order, it's just that this is not
observable when basic operations are intact.

When all the keys are fixnums, though, we can do the optimization we're
doing right now. Check for this condition.

* Update yjit/src/cruby_bindings.inc.rs

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2022-12-02 11:40:16 -05:00
Takashi Kokubun
2c939458ca
YJIT: Reorder branches for Fixnum opt_case_dispatch (#6841)
* YJIT: Reorder branches for Fixnum opt_case_dispatch

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

* YJIT: Don't support too large values

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-12-01 10:59:56 -05:00
Aaron Patterson
9e067df76b 32 bit comparison on shape id
This commit changes the shape id comparisons to use a 32 bit comparison
rather than 64 bit.  That means we don't need to load the shape id to a
register on x86 machines.

Given the following program:

```ruby
class Foo
  def initialize
    @foo = 1
    @bar = 1
  end

  def read
    [@foo, @bar]
  end
end

foo = Foo.new
foo.read
foo.read
foo.read
foo.read
foo.read

puts RubyVM::YJIT.disasm(Foo.instance_method(:read))
```

The machine code we generated _before_ this change is like this:

```
== BLOCK 1/4, ISEQ RANGE [0,3), 65 bytes ======================
  # getinstancevariable
  0x559a18623023: mov rax, qword ptr [r13 + 0x18]
  # guard object is heap
  0x559a18623027: test al, 7
  0x559a1862302a: jne 0x559a1862502d
  0x559a18623030: cmp rax, 4
  0x559a18623034: jbe 0x559a1862502d
  # guard shape, embedded, and T_OBJECT
  0x559a1862303a: mov rcx, qword ptr [rax]
  0x559a1862303d: movabs r11, 0xffff00000000201f
  0x559a18623047: and rcx, r11
  0x559a1862304a: movabs r11, 0xb000000002001
  0x559a18623054: cmp rcx, r11
  0x559a18623057: jne 0x559a18625046
  0x559a1862305d: mov rax, qword ptr [rax + 0x18]
  0x559a18623061: mov qword ptr [rbx], rax

== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 47 bytes ======================
  # gen_direct_jmp: fallthrough
  # getinstancevariable
  # regenerate_branch
  # getinstancevariable
  # regenerate_branch
  0x559a18623064: mov rax, qword ptr [r13 + 0x18]
  # guard shape, embedded, and T_OBJECT
  0x559a18623068: mov rcx, qword ptr [rax]
  0x559a1862306b: movabs r11, 0xffff00000000201f
  0x559a18623075: and rcx, r11
  0x559a18623078: movabs r11, 0xb000000002001
  0x559a18623082: cmp rcx, r11
  0x559a18623085: jne 0x559a18625099
  0x559a1862308b: mov rax, qword ptr [rax + 0x20]
  0x559a1862308f: mov qword ptr [rbx + 8], rax
```

After this change, it's like this:

```
== BLOCK 1/4, ISEQ RANGE [0,3), 41 bytes ======================
  # getinstancevariable
  0x5560c986d023: mov rax, qword ptr [r13 + 0x18]
  # guard object is heap
  0x5560c986d027: test al, 7
  0x5560c986d02a: jne 0x5560c986f02d
  0x5560c986d030: cmp rax, 4
  0x5560c986d034: jbe 0x5560c986f02d
  # guard shape
  0x5560c986d03a: cmp word ptr [rax + 6], 0x19
  0x5560c986d03f: jne 0x5560c986f046
  0x5560c986d045: mov rax, qword ptr [rax + 0x10]
  0x5560c986d049: mov qword ptr [rbx], rax

== BLOCK 2/4, ISEQ RANGE [3,6), 0 bytes =======================
== BLOCK 3/4, ISEQ RANGE [3,6), 23 bytes ======================
  # gen_direct_jmp: fallthrough
  # getinstancevariable
  # regenerate_branch
  # getinstancevariable
  # regenerate_branch
  0x5560c986d04c: mov rax, qword ptr [r13 + 0x18]
  # guard shape
  0x5560c986d050: cmp word ptr [rax + 6], 0x19
  0x5560c986d055: jne 0x5560c986f099
  0x5560c986d05b: mov rax, qword ptr [rax + 0x18]
  0x5560c986d05f: mov qword ptr [rbx + 8], rax
```

The first ivar read is a bit more complex, but the second ivar read is
much simpler.  I think eventually we could teach the context about the
shape, then emit only one shape guard.
2022-11-18 12:04:10 -08:00
Aaron Patterson
b7d591643a Remove USE_RVARGC code
We don't need this constant to be exposed anymore, so remove it
2022-11-14 15:42:11 -08:00
Jimmy Miller
1a65ab20cb
Implement optimize call (#6691)
This dispatches to a c func for doing the dynamic lookup. I experimented with chain on the proc but wasn't able to detect which call sites would be monomorphic vs polymorphic. There is definitely room for optimization here, but it does reduce exits.
2022-11-08 15:28:28 -05:00
Takashi Kokubun
81e84e0a4d
YJIT: Support invokeblock (#6640)
* YJIT: Support invokeblock

* Update yjit/src/backend/arm64/mod.rs

* Update yjit/src/codegen.rs

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-11-02 12:30:48 -04:00
Matthew Draper
c746f380f2
YJIT: Support nil and blockparamproxy as blockarg in send (#6492)
Co-authored-by: John Hawthorn <john@hawthorn.email>

Co-authored-by: John Hawthorn <john@hawthorn.email>
2022-10-26 15:27:59 -04:00
Takashi Kokubun
b7644a2311
YJIT: GC and recompile all code pages (#6406)
when it fails to allocate a new page.

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2022-10-25 09:07:10 -07:00
Nobuyoshi Nakada
245ad2b38a YJIT: incorporate ruby_special_consts 2022-10-20 15:43:34 -04:00
Aaron Patterson
1acc1a5c6d YJIT doesn't need rb_obj_ensure_iv_index_mapping
We should make this function static and remove it from YJIT bindings.
2022-10-14 17:14:41 -07:00
Jimmy Miller
467992ee35
Implement optimize send in yjit (#6488)
* Implement optimize send in yjit

This successfully makes all our benchmarks exit way less for optimize send reasons.
It makes some benchmarks faster, but not by as much as I'd like. I think this implementation
works, but there are definitely more optimial arrangements. For example, what if we compiled
send to a jump table? That seems like perhaps the most optimal we could do, but not obvious (to me)
how to implement give our current setup.

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* Attempt at fixing the issues raised by @XrXr

* fix allowlist

* returns 0 instead of nil when not found

* remove comment about encoding exception

* Fix up c changes

* Update assert

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* get rid of unneeded code and fix the flags

* Apply suggestions from code review

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* rename and fix typo

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2022-10-11 16:37:05 -04:00
Jemma Issroff
ad63b668e2
Revert "Revert "This commit implements the Object Shapes technique in CRuby.""
This reverts commit 9a6803c90b.
2022-10-11 08:40:56 -07:00
Alan Wu
7293bfe1bf
YJIT: add support for calling bmethods (#6489)
* YJIT: fix a parameter name

* YJIT: add support for calling bmethods

This commit adds support for the VM_METHOD_TYPE_BMETHOD method type in
YJIT. You can get these type of methods from facilities like
Kernel#define_singleton_method and Module#define_method.

Even though the body of these methods are blocks, the parameter setup
for them is exactly the same as VM_METHOD_TYPE_ISEQ, so we can reuse
the same logic in gen_send_iseq(). You can see this from how
vm_call_bmethod() eventually calls setup_parameters_complex() with
arg_setup_method.

Bmethods do need their frame environment to be setup differently. We
handle this by allowing callers of gen_send_iseq() to control the iseq,
the frame flag, and the prev_ep. The `prev_ep` goes into the same
location as the block handler would go into in an iseq method frame.

Co-authored-by: John Hawthorn <john@hawthorn.email>

Co-authored-by: John Hawthorn <john@hawthorn.email>
2022-10-04 22:48:05 -04:00
Aaron Patterson
9a6803c90b
Revert "This commit implements the Object Shapes technique in CRuby."
This reverts commit 68bc9e2e97d12f80df0d113e284864e225f771c2.
2022-09-30 16:01:50 -07:00
Jemma Issroff
d594a5a8bd
This commit implements the Object Shapes technique in CRuby.
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects.  Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness").  Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree.  Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.

For example:

```ruby
class Foo
  def initialize
    # Starts with shape id 0
    @a = 1 # transitions to shape id 1
    @b = 1 # transitions to shape id 2
  end
end

class Bar
  def initialize
    # Starts with shape id 0
    @a = 1 # transitions to shape id 1
    @b = 1 # transitions to shape id 2
  end
end

foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```

Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.

This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.

This commit also adds some methods for debugging shapes on objects.  See
`RubyVM::Shape` for more details.

For more context on Object Shapes, see [Feature: #18776]

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
2022-09-28 08:26:21 -07:00
Aaron Patterson
06abfa5be6
Revert this until we can figure out WB issues or remove shapes from GC
Revert "* expand tabs. [ci skip]"

This reverts commit 830b5b5c35.

Revert "This commit implements the Object Shapes technique in CRuby."

This reverts commit 9ddfd2ca00.
2022-09-26 16:10:11 -07:00
Jemma Issroff
9ddfd2ca00 This commit implements the Object Shapes technique in CRuby.
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects.  Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness").  Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree.  Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.

For example:

```ruby
class Foo
  def initialize
    # Starts with shape id 0
    @a = 1 # transitions to shape id 1
    @b = 1 # transitions to shape id 2
  end
end

class Bar
  def initialize
    # Starts with shape id 0
    @a = 1 # transitions to shape id 1
    @b = 1 # transitions to shape id 2
  end
end

foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```

Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.

This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.

This commit also adds some methods for debugging shapes on objects.  See
`RubyVM::Shape` for more details.

For more context on Object Shapes, see [Feature: #18776]

Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
2022-09-26 09:21:30 -07:00
Takashi Kokubun
0ca037b35c
Update bindgen crate (#6397)
to get rid of deprecated indirect dependency, ansi_term
2022-09-18 20:42:57 +09:00
John Hawthorn
f98d6d3f38
YJIT: Implement specialized respond_to? (#6363)
* Add rb_callable_method_entry_or_negative

* YJIT: Implement specialized respond_to?

This implements a specialized respond_to? in YJIT.

* Update yjit/src/codegen.rs

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-09-14 16:15:55 -04:00
Jimmy Miller
758a1d7302
Initial support for VM_CALL_ARGS_SPLAT (#6341)
* Initial support for VM_CALL_ARGS_SPLAT

This implements support for calls with splat (*) for some methods. In
benchmarks this made very little difference for most benchmarks, but a
large difference for binarytrees. Looking at side exits, many
benchmarks now don't exit for splat, but exit for some other
reason. Binarytrees however had a number of calls that used splat args
that are now much faster. In my non-scientific benchmarking this made
splat args performance on par with not using splat args at all.

* Fix wording and whitespace

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>

* Get rid of side_effect reassignment

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
2022-09-14 10:32:22 -04:00
John Hawthorn
1cc97412cd Remove rb_iseq_each 2022-09-01 15:20:49 -07:00
Alan Wu
46007b88af A64: Only clear icache when writing out new code (https://github.com/Shopify/ruby/pull/442)
Previously we cleared the cache for all the code in the system when we
flip memory protection, which was prohibitively expensive since the
operation is not constant time. Instead, only clear the cache for the
memory region of newly written code when we write out new code.

This brings the runtime for the 30k_if_else test down to about 6 seconds
from the previous 45 seconds on my laptop.
2022-08-29 09:09:41 -07:00
Alan Wu
2f9df46654
Use bindgen for old manual extern declarations (https://github.com/Shopify/ruby/pull/404)
We have a large extern block in cruby.rs leftover from the port. We can
use bindgen for it now and reserve the manual declaration for just a
handful of vm_insnhelper.c functions.

Fixup a few minor discrepencies bindgen found between the C declaration
and the manual declaration. Mostly missing `const` on the C side.
2022-08-29 08:47:11 -07:00
Noah Gibbs
b4be3c00c5 add --yjit-dump-iseqs param (https://github.com/Shopify/ruby/pull/332) 2022-08-24 10:42:45 -07:00
Peter Zhu
7424ea184f Implement Objects on VWA
This commit implements Objects on Variable Width Allocation. This allows
Objects with more ivars to be embedded (i.e. contents directly follow the
object header) which improves performance through better cache locality.
2022-07-15 09:21:07 -04:00
Noah Gibbs (and/or Benchmark CI)
a2e0815e27 Switch YJIT to using rb_str_buf_append rather than rb_str_append when encodings don't match, as discussed with byroot 2022-07-06 17:25:58 +02:00