It's not rare for structs to have additional ivars, hence are one
of the most common, if not the most common type in the `gen_fields_tbl`.
This can cause Ractor contention, but even in single ractor mode
means having to do a hash lookup to access the ivars, and increase
GC work.
Instead, unless the struct is perfectly right sized, we can store
a reference to the associated IMEMO/fields object right after the
last struct member.
```
compare-ruby: ruby 3.5.0dev (2025-08-06T12:50:36Z struct-ivar-fields-2 9a30d141a1) +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-06T12:57:59Z struct-ivar-fields-2 2ff3ec237f) +PRISM [arm64-darwin24]
warming up.....
| |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|member_reader | 590.317k| 579.246k|
| | 1.02x| -|
|member_writer | 543.963k| 527.104k|
| | 1.03x| -|
|member_reader_method | 213.540k| 213.004k|
| | 1.00x| -|
|member_writer_method | 192.657k| 191.491k|
| | 1.01x| -|
|ivar_reader | 403.993k| 569.915k|
| | -| 1.41x|
```
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
If `get_next_shape_internal` fail to return a shape, we must
transitiont to a complex shape. `shape_transition_object_id`
mistakenly didn't.
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
For now this doesn't change anything, but now that the table
is managed by GC, it opens the door to use RCU when in multi-ractor
mode, hence allow unsynchornized reads.
One of the biggest remaining contention point is `RClass.cc_table`.
The logical solution would be to turn it into a managed object, so
we can use an RCU strategy, given it's read heavy.
However, that's not currently possible because the table can't
be freed before the owning class, given the class free function
MUST go over all the CC entries to invalidate them.
However if the `CC->klass` reference is weak marked, then the
GC will take care of setting the reference to `Qundef`.
During Ruby's shutdown, we no longer need to check the fstr of the symbol
because we don't use the fstr anymore for freeing the symbol. This can also
fix the following ASAN error:
==2721247==ERROR: AddressSanitizer: use-after-poison on address 0x75fa90a627b8 at pc 0x64a7b06fb4bc bp 0x7ffdf95ba9b0 sp 0x7ffdf95ba9a8
READ of size 8 at 0x75fa90a627b8 thread T0
#0 0x64a7b06fb4bb in RB_BUILTIN_TYPE include/ruby/internal/value_type.h:191:30
#1 0x64a7b06fb4bb in rb_gc_shutdown_call_finalizer_p gc.c:357:18
#2 0x64a7b06fb4bb in rb_gc_impl_shutdown_call_finalizer gc/default/default.c:3045:21
#3 0x64a7b06fb4bb in rb_objspace_call_finalizer gc.c:1739:5
#4 0x64a7b06ca1b2 in rb_ec_finalize eval.c:165:5
#5 0x64a7b06ca1b2 in rb_ec_cleanup eval.c:256:5
#6 0x64a7b06c98a3 in ruby_cleanup eval.c:179:12
We don't need to delay the freeing of the fstr for the symbol if we store
the hash of the fstr in the dynamic symbol and we use compare-by-identity
for removing the dynamic symbol from the sym_set.
`ObjectSpace.count_objects` could cause an unintended array allocation.
It returns a hash like `{ :T_ARRAY => 100, :T_STRING => 100, ... }`, so
it creates the key symbol (e.g., `:T_STRING`) for the first time. On
rare occations, this symbol creation internally allocates a new array
for symbol management.
This led to a problematic side effect where calling `count_objects`
twice in a row could produce inconsistent results: the first call would
trigger the hidden array allocation, and the second call would then
report an increased count for `:T_ARRAY`.
This behavior caused test failures in `test/ruby/test_allocation.rb`,
which performs a baseline measurement before an operation and then
asserts the exact number of new allocations.
20250716T053005Z.fail.html.gz
> 1) Failure:
> TestAllocation::ProcCall::WithBlock#test_ruby2_keywords [...]:
> Expected 1 array allocations for "r2k.(1, a: 2, &block)", but 2 arrays allocated.
This change resolves the issue by pre-interning all key symbols used by
`ObjectSpace.count_objects` before its counting. This eliminates the
side effect and ensures the stability of allocation-sensitive tests.
Co-authored-by: Koichi Sasada <ko1@atdot.net>
Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC
threads work in parallel during a GC. Currently, when two GC threads
scan two iseq objects simultaneously when YJIT is enabled, both threads
will attempt to borrow `CodeBlock::mem_block`, which will result in
panic.
This commit makes one part of the change.
We now set the YJIT code memory to writable in bulk before the
reference-updating phase, and reset it to executable in bulk after the
reference-updating phase. Previously, YJIT lazily sets memory pages
writable while updating object references embedded in JIT-compiled
machine code, and sets the memory back to executable by calling
`mark_all_executable`. This approach is inherently unfriendly to
parallel GC because (1) it borrows `CodeBlock::mem_block`, and (2) it
sets the whole `CodeBlock` as executable which races with other GC
threads that are updating other iseq objects. It also has performance
overhead due to the frequent invocation of system calls. We now set the
permission of all the code memory in bulk before and after the reference
updating phase. Multiple GC threads can now perform raw memory writes
in parallel. We should also see performance improvement during moving
GC because of the reduced number of `mprotect` system calls.
The ASAN poison functions was always defined in gc.c, even if ASAN was not
enabled. This made function calls to happen all the time even if ASAN is
not enabled. This commit defines these functions as empty macros when ASAN
is not enabled.
This commit extracts the Ractor safe table used for frozen strings into
ractor_safe_table.c, which will allow it to be used elsewhere, including
for the global symbol table.
Followup: https://github.com/ruby/ruby/pull/13589
This simplify a lot of things, as we no longer need to manually
manage the memory, we can use the Read-Copy-Update pattern and
avoid numerous race conditions.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
The FL_EXIVAR is a bit redundant with the shape_id.
Now that the `shape_id` is embedded in all objects on all archs,
we can cheaply check if an object has any fields with a simple
bitmask.
This behave almost exactly as a T_OBJECT, the layout is entirely
compatible.
This aims to solve two problems.
First, it solves the problem of namspaced classes having
a single `shape_id`. Now each namespaced classext
has an object that can hold the namespace specific
shape.
Second, it open the door to later make class instance variable
writes atomics, hence be able to read class variables
without locking the VM.
In the future, in multi-ractor mode, we can do the write
on a copy of the `fields_obj` and then atomically swap it.
Considerations:
- Right now the `RClass` shape_id is always synchronized,
but with namespace we should likely mark classes that have
multiple namespace with a specific shape flag.
This data is redundant because the shape already contains both the
length and capacity of the object's fields.
So it both waste space and create the possibility of a desync between
the two.
We also do not need to initialize everything to Qundef, this seem
to be a left-over from pre-shape instance variables.
Fixes [Bug #21201]
This change addresses a performance regression where defining methods
inside `refine` blocks caused severe slowdowns. The issue was due to
`rb_clear_all_refinement_method_cache()` triggering a full object
space scan via `rb_objspace_each_objects` to find and invalidate
affected callcaches, which is very inefficient.
To fix this, I introduce `vm->cc_refinement_table` to track
callcaches related to refinements. This allows us to invalidate
only the necessary callcaches without scanning the entire heap,
resulting in significant performance improvement.
Now that there no longer multiple shape roots, all we need to do
when moving an object from one slot to the other is to update the
`heap_index` part of the shape_id.
Since this never need to create a shape transition, it will always
work and never result in a complex shape.
A finalizer registerred in Ractor A can be invoked in B.
```ruby
require "tempfile"
r = Ractor.new{
10_000.times{|i|
Tempfile.new(["file_to_require_from_ractor#{i}", ".rb"])
}
}
sleep 0.1
```
For example, above script makes tempfiles which have finalizers
on Ractor r, but at the end of the process, main Ractor will invoke
finalizers and it violates belonging check. This patch just ignore
the belonging check to avoid CI failure.
Of course it violates Ractor's isolation and wrong workaround.
This issue will be solved with Ractor local GC.
Now `rp(obj)` doesn't work if the `obj` is out-of-heap because
of `asan_unpoisoning_object()`, so this patch solves it.
Also add pointer information and type information to show.
* Added `Ractor::Port`
* `Ractor::Port#receive` (support multi-threads)
* `Rcator::Port#close`
* `Ractor::Port#closed?`
* Added some methods
* `Ractor#join`
* `Ractor#value`
* `Ractor#monitor`
* `Ractor#unmonitor`
* Removed some methods
* `Ractor#take`
* `Ractor.yield`
* Change the spec
* `Racotr.select`
You can wait for multiple sequences of messages with `Ractor::Port`.
```ruby
ports = 3.times.map{ Ractor::Port.new }
ports.map.with_index do |port, ri|
Ractor.new port,ri do |port, ri|
3.times{|i| port << "r#{ri}-#{i}"}
end
end
p ports.each{|port| pp 3.times.map{port.receive}}
```
In this example, we use 3 ports, and 3 Ractors send messages to them respectively.
We can receive a series of messages from each port.
You can use `Ractor#value` to get the last value of a Ractor's block:
```ruby
result = Ractor.new do
heavy_task()
end.value
```
You can wait for the termination of a Ractor with `Ractor#join` like this:
```ruby
Ractor.new do
some_task()
end.join
```
`#value` and `#join` are similar to `Thread#value` and `Thread#join`.
To implement `#join`, `Ractor#monitor` (and `Ractor#unmonitor`) is introduced.
This commit changes `Ractor.select()` method.
It now only accepts ports or Ractors, and returns when a port receives a message or a Ractor terminates.
We removes `Ractor.yield` and `Ractor#take` because:
* `Ractor::Port` supports most of similar use cases in a simpler manner.
* Removing them significantly simplifies the code.
We also change the internal thread scheduler code (thread_pthread.c):
* During barrier synchronization, we keep the `ractor_sched` lock to avoid deadlocks.
This lock is released by `rb_ractor_sched_barrier_end()`
which is called at the end of operations that require the barrier.
* fix potential deadlock issues by checking interrupts just before setting UBF.
https://bugs.ruby-lang.org/issues/21262
MAX_IV_COUNT is a hint which determines the size of variable width
allocation we should use for a given class. We don't need to scope this
by namespace, if we end up with larger builtin objects on some
namespaces that isn't a user-visible problem, just extra memory use.
Similarly variation_count is used to track if a given object has had too
many branches in shapes it has used, and to use too_complex when that
happens. That's also just a hint, so we can use the same value across
namespaces without it being visible to users.
Previously variation_count was being incremented (written to) on the
RCLASS_EXT_READABLE ext, which seems incorrect if we wanted it to be
different across namespaces
The id2ref table could contain dead entries which should not be passed
into rb_gc_location. Also, we already update references in gc_update_references
using rb_gc_vm_weak_table_foreach so we do not need to update it again.
This makes `RBobject` `4B` larger on 32 bit systems
but simplifies the implementation a lot.
[Feature #21353]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
Superclasses can't be modified by user code, so do not need namespace
indirection. For example Object.superclass is always BasicObject, no
matter what modules are included onto it.
Classes from the default namespace are not writable, however they do not
transition to too_complex until they have been written to inside a user
namespace. So this assertion is invalid (as is the previous location it
was) but it doesn't seem to provide us much value.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
When classes are booted, they should all be writeable unless namespaces
are enabled. This commit adds an assertion to ensure that classes are
writable.
We don't need the key, so we can improve performance by only iterating
on the value.
This will also fix the MMTk build because looking up the key in
rb_id_table_foreach requires locking the VM, which is not supported in
the MMTk worker threads.
Building that table will likely malloc several time which
can trigger GC and cause race condition by freeing objects
that were just added to the table.
Disabling GC to prevent the race condition isn't elegant,
but iven this is a deprecated callpath that is executed at
most once per process, it seems acceptable.
If the object isn't shareable and already has a object_id
we can access it without a lock.
If we need to generate an ID, we may need to lock to find
the child shape.
We also generate the next `object_id` using atomics.
Given classes and modules have a different set of fields in every
namespace, we can't store the object_id in fields for them.
Given that some space was freed in `RClass` we can store it there
instead.