It is much more convenient than storing the klass, especially
when dealing with `object_id` as it allows to update the id2ref
table without having to dereference the owner, which may be
garbage at that point.
The `p->field = rb_gc_location(p->field)` isn't ideal because it means all
references are rewritten on compaction, regardless of whether the referenced
object has moved. This isn't good for caches nor for Copy-on-Write.
`rb_gc_mark_and_move` avoid needless writes, and most of the time allow to
have a single function for both marking and updating references.
If `get_next_shape_internal` fail to return a shape, we must
transitiont to a complex shape. `shape_transition_object_id`
mistakenly didn't.
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
Previously it was possible for two atomic increments of next_shape_id
running concurrently to overflow MAX_SHAPE_ID. For this reason we need
to do the test atomically with the allocation in shape_alloc returning
NULL.
This avoids overflowing next_shape_id by repeatedly attempting a CAS.
Alternatively we could have allowed incrementing past MAX_SHAPE_ID and
then clamping in rb_shapes_count, but this seems simpler.
When sharing between threads we need both atomic reads and writes. We
probably didn't need to use this in some cases (where we weren't running
in multi-ractor mode) but I think it's best to be consistent.
This was using < so subtract one from the last shape id would have us
miss the last inserted shape. I think this is unlikely to have caused
issues because I don't think the newest shape will ever have edges.
We do need to use `- 1` because otherwise RSHAPE wraps around and
returns the root shape.
The `FL_FREEZE` flag is redundant with `SHAPE_ID_FL_FROZEN`, so
ideally it should be eliminated in favor of the later.
Doing so would eliminate the risk of desync between the two, but
also solve the problem of the frozen status being global in namespace
context (See Bug #21330).
The FL_EXIVAR is a bit redundant with the shape_id.
Now that the `shape_id` is embedded in all objects on all archs,
we can cheaply check if an object has any fields with a simple
bitmask.
This behave almost exactly as a T_OBJECT, the layout is entirely
compatible.
This aims to solve two problems.
First, it solves the problem of namspaced classes having
a single `shape_id`. Now each namespaced classext
has an object that can hold the namespace specific
shape.
Second, it open the door to later make class instance variable
writes atomics, hence be able to read class variables
without locking the VM.
In the future, in multi-ractor mode, we can do the write
on a copy of the `fields_obj` and then atomically swap it.
Considerations:
- Right now the `RClass` shape_id is always synchronized,
but with namespace we should likely mark classes that have
multiple namespace with a specific shape flag.
The type isn't opaque because Ruby isn't often compiled with LTO,
so for optimization purpose it's better to allow as much inlining
as possible.
However ideally only `shape.c` and `shape.h` should deal with
the actual struct, and everything else should just deal with opaque
`shape_id_t`.
Now that `rb_shape_traverse_from_new_root` has been eliminated there's
no longer any reason to pin these objects, because we no longer
need to traverse shapes downward during compaction.
Now that there no longer multiple shape roots, all we need to do
when moving an object from one slot to the other is to update the
`heap_index` part of the shape_id.
Since this never need to create a shape transition, it will always
work and never result in a complex shape.
This used to be protected because all shape code was
under a lock, but now that the shape tree is lock-free
we still need to lock around the red-black cache.
Co-Authored-By: Luke Gruber <luke.gruber@shopify.com>
Instead `shape_id_t` higher bits contain flags, and the first one
tells whether the shape is frozen.
This has multiple benefits:
- Can check if a shape is frozen with a single bit check instead of
dereferencing a pointer.
- Guarantees it is always possible to transition to frozen.
- This allow reclaiming `FL_FREEZE` (not done yet).
The downside is you have to be careful to preserve these flags
when transitioning.
Whenever we run into an inline cache miss when we try to set
an ivar, we may need to take the global lock, just to be able to
lookup inside `shape->edges`.
To solve that, when we're in multi-ractor mode, we can treat
the `shape->edges` as immutable. When we need to add a new
edge, we first copy the table, and then replace it with
CAS.
This increases memory allocations, however we expect that
creating new transitions becomes increasingly rare over time.
```ruby
class A
def initialize(bool)
@a = 1
if bool
@b = 2
else
@c = 3
end
end
def test
@d = 4
end
end
def bench(iterations)
i = iterations
while i > 0
A.new(true).test
A.new(false).test
i -= 1
end
end
if ARGV.first == "ractor"
ractors = 8.times.map do
Ractor.new do
bench(20_000_000 / 8)
end
end
ractors.each(&:take)
else
bench(20_000_000)
end
```
The above benchmark takes 27 seconds in Ractor mode on Ruby 3.4,
and only 1.7s with this branch.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>