Commit graph

292 commits

Author SHA1 Message Date
Aaron Patterson
cdf33ed5f3 Optimized forwarding callers and callees
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.

Calls it optimizes look like this:

```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```

```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```

```ruby
def bar(*a) = a

def foo(...)
  list = [1, 2]
  bar(*list, ...) # optimized
end
foo(123)
```

All variants of the above but using `super` are also optimized, including a bare super like this:

```ruby
def foo(...)
  super
end
```

This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:

```ruby
def m
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end

def bar(a) = a
def foo(...) = bar(...)

def test
  m { foo(123) }
end

test
p test # allocates 1 object on master, but 0 objects with this patch
```

```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)

def test
  m { foo(1, b: 2) }
end

test
p test # allocates 2 objects on master, but 0 objects with this patch
```

How does it work?
-----------------

This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.

I think this description is kind of confusing, so let's walk through an example with code.

```ruby
def delegatee(a, b) = a + b

def delegator(...)
  delegatee(...)  # CI2 (FORWARDING)
end

def caller
  delegator(1, 2) # CI1 (argc: 2)
end
```

Before we call the delegator method, the stack looks like this:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   |
              5|   delegatee(...)  # CI2 (FORWARDING)  |
              6| end                                   |
              7|                                       |
              8| def caller                            |
          ->  9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method.  The `delegator` method has a special local called `...`
that holds the caller's CI object.

Here is the ISeq disasm fo `delegator`:

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

The local called `...` will contain the caller's CI: CI1.

Here is the stack when we enter `delegator`:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
           -> 4|   #                                   | CI1 (argc: 2)
              5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            |
              9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`.  In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`.  It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).

Before executing the `send` instruction, we push `...` on the stack.  The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

Instruction 001 puts the caller's CI on the stack.  `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   | CI1 (argc: 2)
           -> 5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            | self
              9|   delegator(1, 2) # CI1 (argc: 2)     | 1
             10| end                                   | 2
```

The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.

Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.

I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.

I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.

For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:

```ruby
SomeObject.new(foo: 1)
```

If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-18 09:28:25 -07:00
Nobuyoshi Nakada
e9a7801a93
Drop support for old ERB 2024-03-03 00:55:45 +09:00
Nobuyoshi Nakada
5ec1fc52c1
Escape non-ascii characters in prelude C comments
Non-ASCII code are often warned by localized compilers.
2023-08-24 21:12:51 +09:00
Takashi Kokubun
e210b899dc
Move the PC regardless of the leaf flag (#8232)
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2023-08-16 20:28:33 -07:00
Takashi Kokubun
38be9a9b72
Clean up OPT_STACK_CACHING (#8132) 2023-07-27 17:27:05 -07:00
John Hawthorn
617c9b4656 Rename opes to operands on RubyVM::BaseInstruction 2023-03-16 14:16:56 -07:00
John Hawthorn
8dbddd5bf8 Rename opes to operands
Co-authored-by: Aaron Patterson <aaron.patterson@gmail.com>
2023-03-16 14:16:56 -07:00
John Hawthorn
d454a590cc Re-add RJIT::Instruction#opes 2023-03-16 14:16:56 -07:00
Takashi Kokubun
6440d159b3 RJIT: Simplify RubyVM::RJIT::Instruction 2023-03-10 13:15:48 -08:00
Takashi Kokubun
2e875549a9 s/MJIT/RJIT/ 2023-03-06 23:44:01 -08:00
Takashi Kokubun
eaccdc1941 Rename MJIT filenames to RJIT 2023-03-06 23:44:01 -08:00
Takashi Kokubun
011c08b643 Remove obsoleted mjit_sp_inc.inc.erb 2023-03-06 23:05:50 -08:00
Takashi Kokubun
14acf9b0a3 Decode trace insns properly 2023-03-05 22:41:35 -08:00
Takashi Kokubun
d9c2eb6f42 Move modules around 2023-03-05 22:11:20 -08:00
Peter Zhu
3e09822407 Fix incorrect line numbers in GC hook
If the previous instruction is not a leaf instruction, then the PC was
incremented before the instruction was ran (meaning the currently
executing instruction is actually the previous instruction), so we
should not increment the PC otherwise we will calculate the source
line for the next instruction.

This bug can be reproduced in the following script:

```
require "objspace"

ObjectSpace.trace_object_allocations_start
a =

  1.0 / 0.0
p [ObjectSpace.allocation_sourceline(a), ObjectSpace.allocation_sourcefile(a)]
```

Which outputs: [4, "test.rb"]

This is incorrect because the object was allocated on line 10 and not
line 4. The behaviour is correct when we use a leaf instruction (e.g.
if we replaced `1.0 / 0.0` with `"hello"`), then the output is:
[10, "test.rb"].

[Bug #19456]
2023-02-24 14:10:09 -05:00
Peter Zhu
d2631c427e Fix RubyVM::CExpr#inspect
@__LINE__ can be nil which causes the inspect method to fail.
2023-02-24 14:10:09 -05:00
Takashi Kokubun
0b2aea861c Polish the public docs for MJIT [ci skip]
Now every private interface is cleaned up, and the public interface is
documented.
2022-12-22 14:30:09 -08:00
Takashi Kokubun
bb4cbd0803
Put RubyVM::MJIT::Compiler under ruby_vm directory (#6989)
[Misc #19250]
2022-12-21 22:46:15 -08:00
Takashi Kokubun
a1d70f5b12 MJIT: Rename mjit_compile_attr to mjit_sp_inc
There's no mjit_compile.inc, so no need to use this prefix anymore.
2022-11-29 21:45:34 -08:00
Nobuyoshi Nakada
c50623f093
Revert "FreeBSD make uses the target under srcdir [ci skip]"
This reverts commit 751ffb276f, which
caused build failures on other platforms.
2022-10-13 12:24:59 +09:00
Nobuyoshi Nakada
751ffb276f
FreeBSD make uses the target under srcdir [ci skip] 2022-10-13 12:10:10 +09:00
Takashi Kokubun
4e0db2f753 mjit_c.rb doesn't need to be an erb 2022-09-23 06:44:28 +09:00
Takashi Kokubun
334b8bd459 Mix manual and auto-generated C APIs 2022-09-23 06:44:28 +09:00
Takashi Kokubun
00c441ce7a Bindgen macro with builtin 2022-09-23 06:44:28 +09:00
Takashi Kokubun
f2bea691cd Builtin RubyVM::MJIT::C 2022-09-23 06:44:28 +09:00
Takashi Kokubun
69130e1614
Expand paths used for dumper.rb
This seems to be needed on Samuel's environment
2022-09-22 21:07:22 +09:00
Takashi Kokubun
a988fe0b3e
Introduce --basedir to insns2vm.rb
and leverage that to preserve the directory structure under tool/ruby_vm/views
2022-09-18 14:39:53 +09:00
Takashi Kokubun
12023c833f
Revert "Preserve the directory structure under tool/ruby_vm/views"
This reverts commit 62ec621f8c.

will revisit this once fixing non-MJIT targets
2022-09-18 14:21:40 +09:00
Takashi Kokubun
62ec621f8c
Preserve the directory structure under tool/ruby_vm/views
for nested target directories
2022-09-18 14:19:22 +09:00
Takashi Kokubun
0e816e6d30
Demote mjit_instruction.rb from builtin to stdlib 2022-09-18 14:04:20 +09:00
Takashi Kokubun
c2986f7d28
Fix warnings from private_constant
`private_constant *constants` seems to be warned for some reason
2022-09-05 00:27:49 -07:00
Takashi Kokubun
3767c6a90d
Ruby MJIT (#6028) 2022-09-04 21:53:46 -07:00
John Hawthorn
679ef34586 New constant caching insn: opt_getconstant_path
Previously YARV bytecode implemented constant caching by having a pair
of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a
series of getconstant calls (with putobject providing supporting
arguments).

This commit replaces that pattern with a new instruction,
opt_getconstant_path, handling both getting/setting the inline cache and
fetching the constant on a cache miss.

This is implemented by storing the full constant path as a
null-terminated array of IDs inside of the IC structure. idNULL is used
to signal an absolute constant reference.

    $ ./miniruby --dump=insns -e '::Foo::Bar::Baz'
    == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE)
    0000 opt_getconstant_path                   <ic:0 ::Foo::Bar::Baz>      (   1)[Li]
    0002 leave

The motivation for this is that we had increasingly found the need to
disassemble the instructions between the opt_getinlinecache and
opt_setinlinecache in order to determine the constant we are fetching,
or otherwise store metadata.

This disassembly was done:
* In opt_setinlinecache, to register the IC against the constant names
  it is using for granular invalidation.
* In rb_iseq_free, to unregister the IC from the invalidation table.
* In YJIT to find the position of a opt_getinlinecache instruction to
  invalidate it when the cache is populated
* In YJIT to register the constant names being used for invalidation.

With this change we no longe need disassemly for these (in fact
rb_iseq_each is now unused), as the list of constant names being
referenced is held in the IC. This should also make it possible to make
more optimizations in the future.

This may also reduce the size of iseqs, as previously each segment
required 32 bytes (on 64-bit platforms) for each constant segment. This
implementation only stores one ID per-segment.

There should be no significant performance change between this and the
previous implementation. Previously opt_getinlinecache was a "leaf"
instruction, but it included a jump (almost always to a separate cache
line). Now opt_getconstant_path is a non-leaf (it may
raise/autoload/call const_missing) but it does not jump. These seem to
even out.
2022-09-01 15:20:49 -07:00
Takashi Kokubun
a60507f616
Rename mjit_compile.c to mjit_compiler.c
I'm planning to introduce mjit_compiler.rb, and I want to make this
consistent with it. Consistency with compile.c doesn't seem important
for MJIT anyway.
2022-08-21 11:33:06 -07:00
Takashi Kokubun
485019c2bd
Rename mjit_exec to jit_exec (#6262)
* Rename mjit_exec to jit_exec

* Rename mjit_exec_slowpath to mjit_check_iseq

* Remove mjit_exec references from comments
2022-08-19 23:57:17 -07:00
Takashi Kokubun
5b21e94beb Expand tabs [ci skip]
[Misc #18891]
2022-07-21 09:42:04 -07:00
Jemma Issroff
85ea46730d Separate TS_IVC and TS_ICVARC in is_entries buffers
This allows us to treat cvar caches differently than ivar caches.
2022-07-18 14:06:30 -07:00
Peter Zhu
7424ea184f Implement Objects on VWA
This commit implements Objects on Variable Width Allocation. This allows
Objects with more ivars to be embedded (i.e. contents directly follow the
object header) which improves performance through better cache locality.
2022-07-15 09:21:07 -04:00
Nobuyoshi Nakada
841521b7c1
Adjust indent [ci skip] 2022-06-30 10:50:31 +09:00
Aaron Patterson
8d157bc806 Move function to static inline so we don't have leaked globals
This function shouldn't leak and is only needed during instruction
assembly
2022-06-29 16:21:48 -07:00
Kevin Newton
6068da8937 Finer-grained constant cache invalidation (take 2)
This commit reintroduces finer-grained constant cache invalidation.
After 8008fb7 got merged, it was causing issues on token-threaded
builds (such as on Windows).

The issue was that when you're iterating through instruction sequences
and using the translator functions to get back the instruction structs,
you're either using `rb_vm_insn_null_translator` or
`rb_vm_insn_addr2insn2` depending if it's a direct-threading build.
`rb_vm_insn_addr2insn2` does some normalization to always return to
you the non-trace version of whatever instruction you're looking at.
`rb_vm_insn_null_translator` does not do that normalization.

This means that when you're looping through the instructions if you're
trying to do an opcode comparison, it can change depending on the type
of threading that you're using. This can be very confusing. So, this
commit creates a new translator function
`rb_vm_insn_normalizing_translator` to always return the non-trace
version so that opcode comparisons don't have to worry about different
configurations.

[Feature #18589]
2022-04-01 14:48:22 -04:00
Nobuyoshi Nakada
69967ee64e
Revert "Finer-grained inline constant cache invalidation"
This reverts commits for [Feature #18589]:
* 8008fb7352
  "Update formatting per feedback"
* 8f6eaca2e1
  "Delete ID from constant cache table if it becomes empty on ISEQ free"
* 629908586b
  "Finer-grained inline constant cache invalidation"

MSWin builds on AppVeyor have been crashing since the merger.
2022-03-25 20:29:09 +09:00
Kevin Newton
629908586b Finer-grained inline constant cache invalidation
Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated.

```ruby
class A
  B = 1
end

def foo
  A::B # inline cache depends on global counter
end

foo # populate inline cache
foo # hit inline cache

C = 1 # global counter increments, all caches are invalidated

foo # misses inline cache due to `C = 1`
```

Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache.

```ruby
class A
  B = 1
end

def foo
  A::B # inline cache depends constants named "A" and "B"
end

foo # populate inline cache
foo # hit inline cache

C = 1 # caches that depend on the name "C" are invalidated

foo # hits inline cache because IC only depends on "A" and "B"
```

Examples of breaking the new cache:

```ruby
module C
  # Breaks `foo` cache because "A" constant is set and the cache in foo depends
  # on "A" and "B"
  class A; end
end

B = 1
```

We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.
2022-03-24 09:14:38 -07:00
Peter Zhu
5f10bd634f Add ISEQ_BODY macro
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.
2022-03-24 10:03:51 -04:00
Jemma Issroff
2913a2f5cf Treat TS_ICVARC cache as separate from TS_IVC cache 2022-02-02 09:20:34 -08:00
Alan Wu
a785e6c356 Make leaf const in VM generator
Assigning to `leaf` in insns.def would give undesirable results.
2021-12-05 11:06:05 -05:00
Yusuke Endoh
c1228f833c vm_core.h: Avoid unaligned access to ic_serial on 32-bit machine
This caused Bus error on 32 bit Solaris
2021-10-29 10:57:46 +09:00
Alan Wu
7c08538aa3 Cleanup diff against upstream. Add comments
I did a `git diff --stat` against upstream and looked at all the files
that are outside of YJIT to come up with these minor changes.
2021-10-20 18:19:42 -04:00
Aaron Patterson
41f405c486 Remove the scraper
Now that we're using the jit function entry point, we don't need the
scraper.  Thank you for your service, scraper. ❤️
2021-10-20 18:19:38 -04:00
Aaron Patterson
30f20d7c38 Remove some MicroJIT vestiges
Just happened to run across this, so lets fix them
2021-10-20 18:19:36 -04:00