Fixes the following warning on WebAssembly:
gc/default.c:7306:1: warning: unused function 'desired_compaction_pages_i' [-Wunused-function]
desired_compaction_pages_i(struct heap_page *page, void *data)
If we are during heap traversal, we don't want to call rb_gc_impl_mark_weak.
This commit moves that check from rb_gc_impl_mark_weak to rb_gc_mark_weak.
The counter for total allocated objects may not be accurate when there are
multiple Ractors since it is not atomic so there could be race conditions
when it is incremented.
When we move an object in compaction, we do not decrement the total_freed_objects
of the original size pool or increment the total_allocated_objects of the
new size pool. This means that when this object dies, it will appear as
if the object was never freed from the original size pool and the new
size pool will have one more free than expected. This means that the new
size pool could appear to have a negative number of live objects.
Using gc_impl.h inside of gc/gc.h will cause gc/gc.h to use the functions
in gc/default.c when builing with shared GC support because gc/gc.h is
included into gc.c before the rb_gc_impl functions are overridden by the
preprocessor.
We don't need to build a linked list from the finalizer table and
instead we can just run the finalizers by iterating the ST table.
This also improves the performance at shutdown, for example:
1_000_000.times.map do
o = Object.new
ObjectSpace.define_finalizer(o, proc { })
o
end
Before:
Time (mean ± σ): 1.722 s ± 0.056 s [User: 1.597 s, System: 0.113 s]
Range (min … max): 1.676 s … 1.863 s 10 runs
After:
Time (mean ± σ): 1.538 s ± 0.025 s [User: 1.437 s, System: 0.093 s]
Range (min … max): 1.510 s … 1.586 s 10 runs
When assertions are enabled, the following code triggers an assertion
error:
GC.disable
GC.start(immediate_mark: false, immediate_sweep: false)
10_000_000.times { Object.new }
This is because the GC.start ignores that the GC is disabled and will
start incremental marking and lazy sweeping. But the assertions in
gc_marks_continue and gc_sweep_continue assert that GC is not disabled.
This commit changes it for the assertion to pass if the GC was triggered
from a method.
If the object being finalized does not have an object ID, then we don't
need to insert into the object ID table, we can simply just allocate a
new object ID by bumping the next_object_id counter. This speeds up
finalization for objects that don't have an object ID. For example, the
following script now runs faster:
1_000_000.times do
o = Object.new
ObjectSpace.define_finalizer(o) {}
end
Before:
Time (mean ± σ): 1.462 s ± 0.019 s [User: 1.360 s, System: 0.094 s]
Range (min … max): 1.441 s … 1.503 s 10 runs
After:
Time (mean ± σ): 1.199 s ± 0.015 s [User: 1.103 s, System: 0.086 s]
Range (min … max): 1.181 s … 1.229 s 10 runs
gc.c mistakenly defined GC_ASSERT as blank, which caused it to be a
no-op. This caused all assertions in gc.c and gc/default.c to not do
anything. This commit fixes it by moving the definition of GC_ASSERT
to gc/gc.h.
We need to remove from the finalizer_table after running all the
finalizers because GC could trigger during the finalizer which could
reclaim the finalizer table array.
The following code crashes:
1_000_000.times do
o = Object.new
ObjectSpace.define_finalizer(o, proc { })
end
We're seeing a crash during shutdown in rb_gc_impl_objspace_free because
it's running lazy sweeping during shutdown. It appears that it's due to
`finalizing` being set, which causes GC to not be aborted and not
disabled which causes it to be in lazy sweeping at shutdown.
The full stack trace is:
#6 rb_bug (fmt=fmt@entry=0x5643b8ebde78 "lazy sweeping underway when freeing object space") at error.c:1095
#7 0x00005643b8a3c697 in rb_gc_impl_objspace_free (objspace_ptr=<optimized out>) at gc/default.c:9507
#8 0x00005643b8c269eb in ruby_vm_destruct (vm=0x7e2fdc84d000) at vm.c:3141
#9 0x00005643b8a5147b in rb_ec_cleanup (ec=<optimized out>, ex=<optimized out>) at eval.c:263
#10 0x00005643b8a51c93 in ruby_run_node (n=<optimized out>) at eval.c:319
#11 0x00005643b8a4c7c7 in rb_main (argv=0x7fffef15e7f8, argc=18) at ./main.c:43
#12 main (argc=<optimized out>, argv=<optimized out>) at ./main.c:62
Previously, GCC 11 on x86-64 inlined the heavy weight logic for
potentially triggering GC into newobj_alloc(). This slowed down
the hotter code path where the ractor cache hits, causing a degradation
to allocation throughput.
Outline the logic into a separate function and have it never inlined.
This restores allocation throughput to the same level as
98eeadc ("Development of 3.4.0 started.").
To evaluate, instrument miniruby so it allocates a bunch of objects and
then exits:
diff --git a/eval.c b/eval.c
--- a/eval.c
+++ b/eval.c
@@ -92,6 +92,15 @@ ruby_setup(void)
}
EC_POP_TAG();
+rb_gc_disable();
+rb_execution_context_t *ec = GET_EC();
+long const n = 20000000;
+for (long i = 0; i < n; ++i) {
+ rb_wb_protected_newobj_of(ec, 0, T_OBJECT, 40);
+}
+printf("alloc %ld\n", n);
+exit(0);
+
return state;
}
With `3.3-equiv` being 98eeadc, and `pre` being f2728c3393
and `post` being this commit, I have:
$ hyperfine -L buildtag post,pre,3.3-equiv '/ruby/build-{buildtag}/miniruby'
Benchmark 1: /ruby/build-post/miniruby
Time (mean ± σ): 873.4 ms ± 2.8 ms [User: 377.6 ms, System: 490.2 ms]
Range (min … max): 868.3 ms … 877.8 ms 10 runs
Benchmark 2: /ruby/build-pre/miniruby
Time (mean ± σ): 960.1 ms ± 2.8 ms [User: 430.8 ms, System: 523.9 ms]
Range (min … max): 955.5 ms … 964.2 ms 10 runs
Benchmark 3: /ruby/build-3.3-equiv/miniruby
Time (mean ± σ): 886.9 ms ± 2.8 ms [User: 379.5 ms, System: 501.0 ms]
Range (min … max): 883.0 ms … 890.8 ms 10 runs
Summary
'/ruby/build-post/miniruby' ran
1.02 ± 0.00 times faster than '/ruby/build-3.3-equiv/miniruby'
1.10 ± 0.00 times faster than '/ruby/build-pre/miniruby'
These results are from a Skylake server with GCC 11.
We discovered that having gc.o and gc_impl.o in separate translation
units diminishes codegen quality with GCC 11 on x86-64. This commit
solves that problem by including default/gc.c into gc.c, letting the
optimizer have visibility into the body of functions again in builds
not using link-time optimization, which are common.
This effectively restores things to the way they were before
[Feature #20470] from the optimizer's perspective while maintaining the
ability to build gc/default.c as a DSO.
There were a few functions duplicated across gc.c and gc/default.c.
Extract them and put them into gc/gc.h.
I get a slight boost from these with GCC 11 on Intel Skylake.
Part of a larger story to fix an allocation throughput regression
compared to 98eeadc ("Development of 3.4.0 started.") as the baseline.
The following code crashes because the GC ran during finalizers will
cause T_ZOMBIE objects to be on the heap, which crashes when we call
rb_gc_obj_free on it:
raise_proc = proc do |id|
GC.start
end
1000.times do
ObjectSpace.define_finalizer(Object.new, raise_proc)
end