We don't need to build a linked list from the finalizer table and
instead we can just run the finalizers by iterating the ST table.
This also improves the performance at shutdown, for example:
1_000_000.times.map do
o = Object.new
ObjectSpace.define_finalizer(o, proc { })
o
end
Before:
Time (mean ± σ): 1.722 s ± 0.056 s [User: 1.597 s, System: 0.113 s]
Range (min … max): 1.676 s … 1.863 s 10 runs
After:
Time (mean ± σ): 1.538 s ± 0.025 s [User: 1.437 s, System: 0.093 s]
Range (min … max): 1.510 s … 1.586 s 10 runs
When assertions are enabled, the following code triggers an assertion
error:
GC.disable
GC.start(immediate_mark: false, immediate_sweep: false)
10_000_000.times { Object.new }
This is because the GC.start ignores that the GC is disabled and will
start incremental marking and lazy sweeping. But the assertions in
gc_marks_continue and gc_sweep_continue assert that GC is not disabled.
This commit changes it for the assertion to pass if the GC was triggered
from a method.
If the object being finalized does not have an object ID, then we don't
need to insert into the object ID table, we can simply just allocate a
new object ID by bumping the next_object_id counter. This speeds up
finalization for objects that don't have an object ID. For example, the
following script now runs faster:
1_000_000.times do
o = Object.new
ObjectSpace.define_finalizer(o) {}
end
Before:
Time (mean ± σ): 1.462 s ± 0.019 s [User: 1.360 s, System: 0.094 s]
Range (min … max): 1.441 s … 1.503 s 10 runs
After:
Time (mean ± σ): 1.199 s ± 0.015 s [User: 1.103 s, System: 0.086 s]
Range (min … max): 1.181 s … 1.229 s 10 runs
gc.c mistakenly defined GC_ASSERT as blank, which caused it to be a
no-op. This caused all assertions in gc.c and gc/default.c to not do
anything. This commit fixes it by moving the definition of GC_ASSERT
to gc/gc.h.
We need to remove from the finalizer_table after running all the
finalizers because GC could trigger during the finalizer which could
reclaim the finalizer table array.
The following code crashes:
1_000_000.times do
o = Object.new
ObjectSpace.define_finalizer(o, proc { })
end
We're seeing a crash during shutdown in rb_gc_impl_objspace_free because
it's running lazy sweeping during shutdown. It appears that it's due to
`finalizing` being set, which causes GC to not be aborted and not
disabled which causes it to be in lazy sweeping at shutdown.
The full stack trace is:
#6 rb_bug (fmt=fmt@entry=0x5643b8ebde78 "lazy sweeping underway when freeing object space") at error.c:1095
#7 0x00005643b8a3c697 in rb_gc_impl_objspace_free (objspace_ptr=<optimized out>) at gc/default.c:9507
#8 0x00005643b8c269eb in ruby_vm_destruct (vm=0x7e2fdc84d000) at vm.c:3141
#9 0x00005643b8a5147b in rb_ec_cleanup (ec=<optimized out>, ex=<optimized out>) at eval.c:263
#10 0x00005643b8a51c93 in ruby_run_node (n=<optimized out>) at eval.c:319
#11 0x00005643b8a4c7c7 in rb_main (argv=0x7fffef15e7f8, argc=18) at ./main.c:43
#12 main (argc=<optimized out>, argv=<optimized out>) at ./main.c:62
Previously, GCC 11 on x86-64 inlined the heavy weight logic for
potentially triggering GC into newobj_alloc(). This slowed down
the hotter code path where the ractor cache hits, causing a degradation
to allocation throughput.
Outline the logic into a separate function and have it never inlined.
This restores allocation throughput to the same level as
98eeadc ("Development of 3.4.0 started.").
To evaluate, instrument miniruby so it allocates a bunch of objects and
then exits:
diff --git a/eval.c b/eval.c
--- a/eval.c
+++ b/eval.c
@@ -92,6 +92,15 @@ ruby_setup(void)
}
EC_POP_TAG();
+rb_gc_disable();
+rb_execution_context_t *ec = GET_EC();
+long const n = 20000000;
+for (long i = 0; i < n; ++i) {
+ rb_wb_protected_newobj_of(ec, 0, T_OBJECT, 40);
+}
+printf("alloc %ld\n", n);
+exit(0);
+
return state;
}
With `3.3-equiv` being 98eeadc, and `pre` being f2728c3393
and `post` being this commit, I have:
$ hyperfine -L buildtag post,pre,3.3-equiv '/ruby/build-{buildtag}/miniruby'
Benchmark 1: /ruby/build-post/miniruby
Time (mean ± σ): 873.4 ms ± 2.8 ms [User: 377.6 ms, System: 490.2 ms]
Range (min … max): 868.3 ms … 877.8 ms 10 runs
Benchmark 2: /ruby/build-pre/miniruby
Time (mean ± σ): 960.1 ms ± 2.8 ms [User: 430.8 ms, System: 523.9 ms]
Range (min … max): 955.5 ms … 964.2 ms 10 runs
Benchmark 3: /ruby/build-3.3-equiv/miniruby
Time (mean ± σ): 886.9 ms ± 2.8 ms [User: 379.5 ms, System: 501.0 ms]
Range (min … max): 883.0 ms … 890.8 ms 10 runs
Summary
'/ruby/build-post/miniruby' ran
1.02 ± 0.00 times faster than '/ruby/build-3.3-equiv/miniruby'
1.10 ± 0.00 times faster than '/ruby/build-pre/miniruby'
These results are from a Skylake server with GCC 11.
We discovered that having gc.o and gc_impl.o in separate translation
units diminishes codegen quality with GCC 11 on x86-64. This commit
solves that problem by including default/gc.c into gc.c, letting the
optimizer have visibility into the body of functions again in builds
not using link-time optimization, which are common.
This effectively restores things to the way they were before
[Feature #20470] from the optimizer's perspective while maintaining the
ability to build gc/default.c as a DSO.
There were a few functions duplicated across gc.c and gc/default.c.
Extract them and put them into gc/gc.h.
I get a slight boost from these with GCC 11 on Intel Skylake.
Part of a larger story to fix an allocation throughput regression
compared to 98eeadc ("Development of 3.4.0 started.") as the baseline.
The following code crashes because the GC ran during finalizers will
cause T_ZOMBIE objects to be on the heap, which crashes when we call
rb_gc_obj_free on it:
raise_proc = proc do |id|
GC.start
end
1000.times do
ObjectSpace.define_finalizer(Object.new, raise_proc)
end
All the non-GC objects (i.e. immediates) have addresses such that
`obj % RUBY_IMMEDIATE_MASK != 0` (except for `Qfalse`, which is 0). We
can define `OBJ_ID_INCREMENT` as `RUBY_IMMEDIATE_MASK + 1` which should
guarantee that GC objects never have conflicting object IDs with
immediates.
This feature provides a new method `GC.config` that configures internal
GC configuration variables provided by an individual GC implementation.
Implemented in this PR is the option `full_mark`: a boolean value that
will determine whether the Ruby GC is allowed to run a major collection
while the process is running.
It has the following semantics
This feature configures Ruby's GC to only run minor GC's. It's designed
to give users relying on Out of Band GC complete control over when a
major GC is run. Configuring `full_mark: false` does two main things:
* Never runs a Major GC. When the heap runs out of space during a minor
and when a major would traditionally be run, instead we allocate more
heap pages, and mark objspace as needing a major GC.
* Don't increment object ages. We don't promote objects during GC, this
will cause every object to be scanned on every minor. This is an
intentional trade-off between minor GC's doing more work every time,
and potentially promoting objects that will then never be GC'd.
The intention behind not aging objects is that users of this feature
should use a preforking web server, or some other method of pre-warming
the oldgen (like Nakayoshi fork)before disabling Majors. That way most
objects that are going to be old will have already been promoted.
This will interleave major and minor GC collections in exactly the same
what that the Ruby GC runs in versions previously to this. This is the
default behaviour.
* This new method has the following extra semantics:
- `GC.config` with no arguments returns a hash of the keys of the
currently configured GC
- `GC.config` with a key pair (eg. `GC.config(full_mark: true)` sets
the matching config key to the corresponding value and returns the
entire known config hash, including the new values. If the key does
not exist, `nil` is returned
* When a minor GC is run, Ruby sets an internal status flag to determine
whether the next GC will be a major or a minor. When `full_mark:
false` this flag is ignored and every GC will be a minor.
This status flag can be accessed at
`GC.latest_gc_info(:needs_major_by)`. Any value other than `nil` means
that the next collection would have been a major.
Thus it's possible to use this feature to check at a predetermined
time, whether a major GC is necessary and run one if it is. eg. After
a request has finished processing.
```ruby
if GC.latest_gc_info(:needs_major_by)
GC.start(full_mark: true)
end
```
[Feature #20443]
This feature is no longer possible under current design; now that our GC
is pluggable, we cannot assume what was achieved by this compiler flag
is always possble by the dynamically-loaded GC implementation.