Commit graph

1254 commits

Author SHA1 Message Date
Jean Boussier
2083fa89fc Implement gen_fields_tbl cache
There is a high likelyhood that `rb_obj_fields` is called
consecutively for the same object.

If we keep a cache of the last IMEMO/fields we interacted with,
we can save having to lookup the `gen_fields_tbl`, synchronize
the VM lock, etc.

On yjit-bench's, I instrumented the hit rate of this cache at:

  - `shipit`: 38%, with 111k hits.
  - `lobsters`: 59%, with 367k hits.
  - `rubocop`: 100% with only 300 hits.

I also ran a micro-benchmark which shows that ivar access is:

  - 1.25x faster when the cache is hit in single ractor mode.
  - 2x faster when the cache is hit in multi ractor mode.
  - 1.06x slower when the cache miss in single ractor mode.
  - 1.01x slower when the cache miss in multi ractor mode.

```yml
prelude: |
  class GenIvar < Array
    def initialize(...)
      super
      @iv = 1
    end

    attr_reader :iv
  end

  a = GenIvar.new
  b = GenIvar.new
benchmark:
  hit: a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv; a.iv;
  miss: a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv; a.iv; b.iv;
```

Single ractor:
```
compare-ruby: ruby 3.5.0dev (2025-08-12T02:14:57Z master 428937a536) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-12T09:25:35Z gen-fields-cache 9456c35893) +YJIT +PRISM [arm64-darwin24]
warming up..

|      |compare-ruby|built-ruby|
|:-----|-----------:|---------:|
|hit   |      4.090M|    5.121M|
|      |           -|     1.25x|
|miss  |      3.756M|    3.534M|
|      |       1.06x|         -|
```

Multi-ractor:
```
compare-ruby: ruby 3.5.0dev (2025-08-12T02:14:57Z master 428937a536) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-12T09:25:35Z gen-fields-cache 9456c35893) +YJIT +PRISM [arm64-darwin24]
warming up..

|      |compare-ruby|built-ruby|
|:-----|-----------:|---------:|
|hit   |      2.205M|    4.460M|
|      |           -|     2.02x|
|miss  |      2.117M|    2.094M|
|      |       1.01x|         -|
```
2025-08-13 19:54:56 +02:00
Peter Zhu
95320f1ddf Fix RUBY_FREE_AT_EXIT for static symbols
Since static symbols allocate memory, we should deallocate them at shutdown
to prevent memory leaks from being reported with RUBY_FREE_AT_EXIT.
2025-08-05 12:04:27 -04:00
Jean Boussier
fc5e1541e4 Use rb_gc_mark_weak for cc->klass.
One of the biggest remaining contention point is `RClass.cc_table`.
The logical solution would be to turn it into a managed object, so
we can use an RCU strategy, given it's read heavy.

However, that's not currently possible because the table can't
be freed before the owning class, given the class free function
MUST go over all the CC entries to invalidate them.

However if the `CC->klass` reference is weak marked, then the
GC will take care of setting the reference to `Qundef`.
2025-08-01 10:42:04 +02:00
Takashi Kokubun
2cd10de330
ZJIT: Prepare for sharing JIT hooks with ZJIT (#14044) 2025-07-30 10:11:10 -07:00
Takashi Kokubun
b22eb0e468
ZJIT: Add --zjit-stats (#14034) 2025-07-29 10:00:15 -07:00
Peter Zhu
2bcb155b49 Convert global symbol table to concurrent set 2025-07-21 10:58:30 -04:00
John Hawthorn
cfc006d410 Always use atomics to get the shape count
When sharing between threads we need both atomic reads and writes. We
probably didn't need to use this in some cases (where we weren't running
in multi-ractor mode) but I think it's best to be consistent.
2025-07-09 10:38:04 -07:00
John Hawthorn
2ed4862690 Remove unnecessary union 2025-06-24 20:02:30 -07:00
Luke Gruber
e3ec101cc2 thread_cleanup: set CFP to NULL before clearing ec's stack
We clear the CFP first so that if a sampling profiler interrupts the current thread during `rb_ec_set_vm_stack`,
`thread_profile_frames` returns early instead of trying to walk the stack that's no longer set on the ec.

The early return in `thread_profile_frames` was introduced at eab7f4623f.

Fixes [Bug #21441]
2025-06-17 15:03:39 -07:00
Satoshi Tagomori
50c6bd47ef Update vm->self location and mark it in vm.c for consistency 2025-06-17 10:07:53 +09:00
Jean Boussier
7c22330cd2 Allocate rb_shape_tree statically
There is no point allocating it during init, it adds
a useless indirection.
2025-06-12 17:08:22 +02:00
Jean Boussier
de4b910381 Get rid of GET_SHAPE_TREE()
It's a useless indirection.
2025-06-12 17:08:22 +02:00
alpaca-tc
c8ddc0a843 Optimize callcache invalidation for refinements
Fixes [Bug #21201]

This change addresses a performance regression where defining methods
inside `refine` blocks caused severe slowdowns. The issue was due to
`rb_clear_all_refinement_method_cache()` triggering a full object
space scan via `rb_objspace_each_objects` to find and invalidate
affected callcaches, which is very inefficient.

To fix this, I introduce `vm->cc_refinement_table` to track
callcaches related to refinements. This allows us to invalidate
only the necessary callcaches without scanning the entire heap,
resulting in significant performance improvement.
2025-06-09 12:33:35 +09:00
John Hawthorn
e596cf6e93 Make FrozenCore a plain T_CLASS 2025-06-02 14:57:48 -04:00
Jean Boussier
e9fd44dd72 shape.c: Implement a lock-free version of get_next_shape_internal
Whenever we run into an inline cache miss when we try to set
an ivar, we may need to take the global lock, just to be able to
lookup inside `shape->edges`.

To solve that, when we're in multi-ractor mode, we can treat
the `shape->edges` as immutable. When we need to add a new
edge, we first copy the table, and then replace it with
CAS.

This increases memory allocations, however we expect that
creating new transitions becomes increasingly rare over time.

```ruby
class A
  def initialize(bool)
    @a = 1
    if bool
      @b = 2
    else
      @c = 3
    end
  end

  def test
    @d = 4
  end
end

def bench(iterations)
  i = iterations
  while i > 0
    A.new(true).test
    A.new(false).test
    i -= 1
  end
end

if ARGV.first == "ractor"
  ractors = 8.times.map do
    Ractor.new do
      bench(20_000_000 / 8)
    end
  end
  ractors.each(&:take)
else
  bench(20_000_000)
end
```

The above benchmark takes 27 seconds in Ractor mode on Ruby 3.4,
and only 1.7s with this branch.

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2025-06-02 17:49:53 +02:00
Koichi Sasada
ef2bb61018 Ractor::Port
* Added `Ractor::Port`
  * `Ractor::Port#receive` (support multi-threads)
  * `Rcator::Port#close`
  * `Ractor::Port#closed?`
* Added some methods
  * `Ractor#join`
  * `Ractor#value`
  * `Ractor#monitor`
  * `Ractor#unmonitor`
* Removed some methods
  * `Ractor#take`
  * `Ractor.yield`
* Change the spec
  * `Racotr.select`

You can wait for multiple sequences of messages with `Ractor::Port`.

```ruby
ports = 3.times.map{ Ractor::Port.new }
ports.map.with_index do |port, ri|
  Ractor.new port,ri do |port, ri|
    3.times{|i| port << "r#{ri}-#{i}"}
  end
end

p ports.each{|port| pp 3.times.map{port.receive}}

```

In this example, we use 3 ports, and 3 Ractors send messages to them respectively.
We can receive a series of messages from each port.

You can use `Ractor#value` to get the last value of a Ractor's block:

```ruby
result = Ractor.new do
  heavy_task()
end.value
```

You can wait for the termination of a Ractor with `Ractor#join` like this:

```ruby
Ractor.new do
  some_task()
end.join
```

`#value` and `#join` are similar to `Thread#value` and `Thread#join`.

To implement `#join`, `Ractor#monitor` (and `Ractor#unmonitor`) is introduced.

This commit changes `Ractor.select()` method.
It now only accepts ports or Ractors, and returns when a port receives a message or a Ractor terminates.

We removes `Ractor.yield` and `Ractor#take` because:
* `Ractor::Port` supports most of similar use cases in a simpler manner.
* Removing them significantly simplifies the code.

We also change the internal thread scheduler code (thread_pthread.c):
* During barrier synchronization, we keep the `ractor_sched` lock to avoid deadlocks.
  This lock is released by `rb_ractor_sched_barrier_end()`
  which is called at the end of operations that require the barrier.
* fix potential deadlock issues by checking interrupts just before setting UBF.

https://bugs.ruby-lang.org/issues/21262
2025-05-31 04:01:33 +09:00
Peter Zhu
b5f5672034 Set iclass_is_origin flag for FrozenCore
We don't free the method table for FrozenCore since it is converted to
an iclass and doesn't have the iclass_is_origin flag set. This causes a
memory leak to be reported during RUBY_FREE_AT_EXIT:

    14  dyld                                  0x19f13ab98 start + 6076
    13  miniruby                              0x100644928 main + 96  main.c:62
    12  miniruby                              0x10064498c rb_main + 48  main.c:42
    11  miniruby                              0x10073be0c ruby_init + 16  eval.c:98
    10  miniruby                              0x10073bc6c ruby_setup + 232  eval.c:87
    9   miniruby                              0x100786b98 rb_call_inits + 168  inits.c:63
    8   miniruby                              0x1009b5010 Init_VM + 212  vm.c:4017
    7   miniruby                              0x10067aae8 rb_class_new + 44  class.c:834
    6   miniruby                              0x10067a04c rb_class_boot + 48  class.c:748
    5   miniruby                              0x10067a250 class_initialize_method_table + 32  class.c:721
    4   miniruby                              0x1009412a8 rb_id_table_create + 24  id_table.c:98
    3   miniruby                              0x100759fac ruby_xmalloc + 24  gc.c:5201
    2   miniruby                              0x10075fc14 ruby_xmalloc_body + 52  gc.c:5211
    1   miniruby                              0x1007726b4 rb_gc_impl_malloc + 92  default.c:8141
    0   libsystem_malloc.dylib                0x19f30d12c _malloc_zone_malloc_instrumented_or_legacy + 152
2025-05-28 13:25:37 -04:00
Nobuyoshi Nakada
aad9fa2853
Use RB_VM_LOCKING 2025-05-25 15:22:43 +09:00
Jean Boussier
83d636f2d0 Free shapes last
[Bug #21352]

`rb_objspace_free_objects` may need to check objects shapes
to know how to free them.
2025-05-19 15:06:08 +02:00
Alan Wu
92b218fbc3 YJIT: ZJIT: Allow both JITs in the same build
This commit allows building YJIT and ZJIT simultaneously, a "combo
build". Previously, `./configure --enable-yjit --enable-zjit` failed. At
runtime, though, only one of the two can be enabled at a time.

Add a root Cargo workspace that contains both the yjit and zjit crate.
The common Rust build integration mechanisms are factored out into
defs/jit.mk.

Combo YJIT+ZJIT dev builds are supported; if either JIT uses
`--enable-*=dev`, both of them are built in dev mode.

The combo build requires Cargo, but building one JIT at a time with only
rustc in release build remains supported.
2025-05-15 00:39:03 +09:00
Luke Gruber
1d4822a175 Get ractor message passing working with > 1 thread sending/receiving values in same ractor
Rework ractors so that any ractor action (Ractor.receive, Ractor#send, Ractor.yield, Ractor#take,
Ractor.select) will operate on the thread that called the action. It will put that thread to sleep if
it's a blocking function and it needs to put it to sleep, and the awakening action (Ractor.yield,
Ractor#send) will wake up the blocked thread.

Before this change every blocking ractor action was associated with the ractor struct and its fields.
If a ractor called Ractor.receive, its wait status was wait_receiving, and when another ractor calls
r.send on it, it will look for that status in the ractor struct fields and wake it up. The problem was that
what if 2 threads call blocking ractor actions in the same ractor. Imagine if 1 thread has called Ractor.receive
and another r.take. Then, when a different ractor calls r.send on it, it doesn't know which ruby thread is associated
to which ractor action, so what ruby thread should it schedule? This change moves some fields onto the ruby thread
itself so that ruby threads are the ones that have ractor blocking statuses, and threads are then specifically scheduled
when unblocked.

Fixes [#17624]
Fixes [#21037]
2025-05-13 13:23:57 -07:00
Samuel Williams
425fa0aeb5
Make waiting_fd behaviour per-IO. (#13127)
- `rb_thread_fd_close` is deprecated and now a no-op.
- IO operations (including close) no longer take a vm-wide lock.
2025-05-13 19:02:03 +09:00
Satoshi Tagomori
382645d440 namespace on read 2025-05-11 23:32:50 +09:00
Jean Boussier
7116b0a7f1 Extract rb_shape_free_all 2025-05-09 10:22:51 +02:00
Jean Boussier
0ea210d1ea Rename ivptr -> fields, next_iv_index -> next_field_index
Ivars will longer be the only thing stored inline
via shapes, so keeping the `iv_index` and `ivptr` names
would be confusing.

Instance variables won't be the only thing stored inline
via shapes, so keeping the `ivptr` name would be confusing.

`field` encompass anything that can be stored in a VALUE array.

Similarly, `gen_ivtbl` becomes `gen_fields_tbl`.
2025-05-08 07:58:05 +02:00
Jean Boussier
3ec7bfff2e Use a set_table for rb_vm_struct.unused_block_warning_table
Now that we have a hash-set implementation we can use that
instead of a hash-table with a static value.
2025-04-27 11:59:28 +02:00
刘皓
45e814d116 Fix jump buffer leak in WASI builds 2025-04-27 15:47:30 +09:00
Takashi Kokubun
8b72e07359 Disable ZJIT profiling at call-threshold (https://github.com/Shopify/zjit/pull/99)
* Disable ZJIT profiling at call-threshold

* Stop referencing ZJIT instructions in codegen
2025-04-18 21:53:01 +09:00
Takashi Kokubun
2915806820 Add --zjit-num-profiles option (https://github.com/Shopify/zjit/pull/98)
* Add --zjit-profile-interval option

* Fix min to max

* Avoid rewriting instructions for --zjit-call-threshold=1

* Rename the option to --zjit-num-profiles
2025-04-18 21:53:01 +09:00
Takashi Kokubun
bb46bb781c Stub Init_builtin_zjit for --disable-zjit 2025-04-18 21:53:00 +09:00
Takashi Kokubun
14253e7d12 Implement Insn::Param using the SP register (https://github.com/Shopify/zjit/pull/39) 2025-04-18 21:52:59 +09:00
Takashi Kokubun
22c73f1ccb Implement FixnumAdd and stub PatchPoint/GuardType (https://github.com/Shopify/zjit/pull/30)
* Implement FixnumAdd and stub PatchPoint/GuardType

Co-authored-by: Max Bernstein <max.bernstein@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>

* Clone Target for arm64

* Use $create instead of use create

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* Fix misindentation from suggested changes

* Drop an unneeded variable for mut

* Load operand into a register only if necessary

---------

Co-authored-by: Max Bernstein <max.bernstein@shopify.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2025-04-18 21:52:59 +09:00
Takashi Kokubun
0a543daf15 Add zjit_* instructions to profile the interpreter (https://github.com/Shopify/zjit/pull/16)
* Add zjit_* instructions to profile the interpreter

* Rename FixnumPlus to FixnumAdd

* Update a comment about Invalidate

* Rename Guard to GuardType

* Rename Invalidate to PatchPoint

* Drop unneeded debug!()

* Plan on profiling the types

* Use the output of GuardType as type refined outputs
2025-04-18 21:52:59 +09:00
Takashi Kokubun
53bee25068 Implement --zjit-call-threshold
As a preparation for introducing a profiling layer, we need to be able
to raise the threshold to run a few cycles for profiling.
2025-04-18 21:52:58 +09:00
Takashi Kokubun
06d875b979 Backport the latest jit_compile() 2025-04-18 21:52:58 +09:00
Alan Wu
1d95139bf6 miniruby --zjit -e nil runs through iseq_to_ssa 2025-04-18 21:52:56 +09:00
Takashi Kokubun
0bb709718b Hook ZJIT compilation 2025-04-18 21:52:56 +09:00
John Hawthorn
57b6a7503f Lock-free hash set for fstrings [Feature #21268]
This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.

As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.

I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.

My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:

 * We only need a hash set rather than a hash table (we only need keys,
   not values), and so the full entry can be written as a single VALUE
 * As a set we only need lookup/insert/delete, no update
 * Delete is only run inside GC so does not need to be atomic (It could
   be made concurrent)
 * I use rb_vm_barrier for the (rare) table rebuilds (It could be made
   concurrent) We VM lock (but don't require other threads to stop) for
   table rebuilds, as those are rare
 * The conservative garbage collector makes deferred replication easy,
   using a T_DATA object

Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.

This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.

Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).
2025-04-18 13:03:54 +09:00
Aaron Patterson
3628e9e30d Remove unused field on Thread struct
It looks like stat_insn_usage was introduced with YARV, but as far as I
can tell the field has never been used.  I think we should remove the
field since we don't use it.
2025-04-11 10:28:26 -07:00
lukeg
d80f3a287c Ractor.make_shareable(proc_obj) makes inner structure shareable
Proc objects are now traversed like other objects when making them
shareable.

Fixes [Bug #19372]
Fixes [Bug #19374]
2025-03-26 16:05:02 -07:00
Alan Wu
08b3a45bc9 Push a real iseq in rb_vm_push_frame_fname()
Previously, vm_make_env_each() (used during proc
creation and for the debug inspector C API) picked up the
non-GC-allocated iseq that rb_vm_push_frame_fname() creates,
which led to a SEGV when the GC tried to mark the non GC object.

Put a real iseq imemo instead. Speed should be about the same since
the old code also did a imemo allocation and a malloc allocation.

Real iseq allows ironing out the special-casing of dummy frames in
rb_execution_context_mark() and rb_execution_context_update(). A check
is added to RubyVM::ISeq#eval, though, to stop attempts to run dummy
iseqs.

[Bug #21180]

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2025-03-12 15:00:26 -04:00
Yusuke Endoh
993fd96ce6 reject numbered parameters from Binding#local_variables
Also, Binding#local_variable_get and #local_variable_set rejects an
access to numbered parameters.

[Bug #20965] [Bug #21049]
2025-02-18 16:23:24 +09:00
Takashi Kokubun
c1ce3d719d Streamline YJIT checks on jit_compile() 2025-02-14 10:40:10 -08:00
Nobuyoshi Nakada
4a67ef09cc
[Feature #21116] Extract RJIT as a third-party gem 2025-02-13 18:01:03 +09:00
Aaron Patterson
d680a13ad0 Always return jit_entry even if NULL
We can just always return the jit_entry since it will be initialized to
NULL.  There is no reason to specifically return NULL if yjit / rjit are
disabled
2025-02-10 15:50:23 -05:00
Peter Zhu
5032791330 Fix conversion of RubyVM::FrozenCore to T_ICLASS
We shouldn't directly set the flags of an object because there could be
other flags set that would be erased. Instead, we can unset T_MASK and
set T_ICLASS isntead.
2025-01-30 10:10:48 -05:00
Peter Zhu
98b36f6f36 Use rb_gc_vm_weak_table_foreach for reference updating
We can use rb_gc_vm_weak_table_foreach for reference updating of weak tables
in the default GC.
2025-01-27 10:28:36 -05:00
Nobuyoshi Nakada
f7059af50a
Use no-inline version rb_current_ec on Arm64
The TLS across .so issue seems related to Arm64, but not Darwin.
2025-01-17 22:48:10 +09:00
Peter Zhu
707c6420b1 Don't reference update frames with VM_FRAME_MAGIC_DUMMY
Frames with VM_FRAME_MAGIC_DUMMY pushed by rb_vm_push_frame_fname have
allocated iseq, so we should not reference update it.
2024-12-17 11:03:38 -05:00
Peter Zhu
92dd9734a9 Fix use-after-free in ep in Proc#dup for ifunc procs
[Bug #20950]

ifunc proc has the ep allocated in the cfunc_proc_t which is the data of
the TypedData object. If an ifunc proc is duplicated, the ep points to
the ep of the source object. If the source object is freed, then the ep
of the duplicated object now points to a freed memory region. If we try
to use the ep we could crash.

For example, the following script crashes:

    p = { a: 1 }.to_proc
    100.times do
      p = p.dup
      GC.start
      p.call
    rescue ArgumentError
    end

This commit changes ifunc proc to also duplicate the ep when it is duplicated.
2024-12-13 10:10:03 -05:00