The term "shady object" was renamed to "uncollectible write barrier
unprotected object", so rename `has_uncollectible_shady_objects` to
`has_uncollectible_wb_unprotected_objects` for consistency.
We always sweep at least 2048 slots per sweep step, but only pool one
page. For large size pools, 2048 slots is many pages but one page is
very few slots. This commit changes it so that at least 1024 slots are
placed in the pooled pages per sweep step.
We move all pooled pages to free pages at the start of incremental
marking, so we shouldn't run incremental marking only when we have run
out of free pages. This causes incremental marking to always complete
in a single step.
If we are in a minor GC and the object to mark is old, then the old
object should already be marked and cannot be reclaimed in this GC cycle
so we don't need to add it to the weak refences list.
This is an internal only function not exposed to the C extension API.
It's only use so far is from rb_vm_mark, where it's used to mark the
values in the vm->trap_list.cmd array.
There shouldn't be any reason why these cannot move.
This commit allows them to move by updating their references during the
reference updating step of compaction.
To do this we've introduced another internal function
rb_gc_update_values as a partner to rb_gc_mark_values.
This allows us to refactor rb_gc_mark_values to not pin
The old algorithm could calculate an undercount for the initial pages
due to two issues:
1. It did not take into account that some heap pages will have one less
slot due to alignment. It assumed that every heap page would be able
to be fully filled with slots. Pages that are unaligned with the slot
size will lose one slot. The new algorithm assumes that every page
will be unaligned.
2. It performed integer division, which truncates down. This means that
the number of pages might not actually satisfy the number of slots.
This can cause the heap to grow in `gc_sweep_finish_size_pool` after
allocating all of the allocatable pages because the total number of
slots would be less than the initial configured number of slots.
This commit changes RUBY_GC_HEAP_INIT_SIZE_{40,80,160,320,640}_SLOTS to
RUBY_GC_HEAP_{0,1,2,3,4}_INIT_SLOTS. This is easier to use because the
user does not need to determine the slot sizes (which can vary between
32 and 64 bit systems). They now just use the heap names
(`GC.stat_heap.keys`).
If initial slots is set, then during a minor GC, if we have allocatable
pages but the heap is mostly full, then we will set `grow_heap` to true
since `total_slots` does not count allocatable pages so it will be less
than `init_slots`. This can cause `allocatable_pages` to grow to much
higher than desired since it will appear that the heap is mostly full.
This commit adds `free_empty_pages` which frees all empty heap pages and
moves the number of pages freed to the allocatable pages counter. This
is used in Process.warmup to improve performance because page
invalidation from copy-on-write is slower than allocating a new page.
[Feature #19783]
This commit adds stats about weak references to `GC.latest_gc_info`.
It adds the following two keys:
- `weak_references_count`: number of weak references registered during
the last GC.
- `retained_weak_references_count`: number of weak references that
survived the last GC.
[Feature #19783]
This commit adds support for weak references in the GC through the
function `rb_gc_mark_weak`. Unlike strong references, weak references
does not mark the object, but rather lets the GC know that an object
refers to another one. If the child object is freed, the pointer from
the parent object is overwritten with `Qundef`.
Co-Authored-By: Jean Boussier <byroot@ruby-lang.org>
This commit adds key force_incremental_marking_finish_count to
GC.stat_heap. This statistic returns the number of times the size pool
has forced incremental marking to finish due to running out of slots.
Functions rgengc_remembered, rgengc_remembered_sweep, and
rgengc_remembersetbits_get are just wrappers of RVALUE_REMEMBERED and
doesn't do much more. We can remove all those and use RVALUE_REMEMBERED
directly instead.
We don't need to check stack for moved objects after compaction because
the mutator cannot run between marking the stack and the end of
compaction. However, the stack may have moved objects leftover from
marking and sweeping phases. This means that their pages will be
invalidated and all objects moved back. We don't need to move these
objects back.
This also fixes the issue on Windows where some compaction tests
sometimes fail due to the page of the object being invalidated.
This commit stores the initial slots per size pool, configured with
the environment variables `RUBY_GC_HEAP_INIT_SIZE_%d_SLOTS`. This
ensures that the configured initial slots remains a low bound for the
number of slots in the heap, which can prevent heaps from thrashing in
size.
From Ruby 3.0, refined method invocations are slow because
resolved methods are not cached by inline cache because of
conservertive strategy. However, `using` clears all caches
so that it seems safe to cache resolved method entries.
This patch caches resolved method entries in inline cache
and clear all of inline method caches when `using` is called.
fix [Bug #18572]
```ruby
# without refinements
class C
def foo = :C
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
_END__
user system total real
master 0.362859 0.002544 0.365403 ( 0.365424)
modified 0.357251 0.000000 0.357251 ( 0.357258)
```
```ruby
# with refinment but without using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
__END__
user system total real
master 0.957182 0.000000 0.957182 ( 0.957212)
modified 0.359228 0.000000 0.359228 ( 0.359238)
```
```ruby
# with using
class C
def foo = :C
end
module R
refine C do
def foo = :R
end
end
N = 1_000_000
using R
obj = C.new
require 'benchmark'
Benchmark.bm{|x|
x.report{N.times{
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
obj.foo; obj.foo; obj.foo; obj.foo; obj.foo;
}}
}
`cc->klass` and `cc->cme_` can be free'ed while last marking
so that it should be checked bofore updating the pointers.
Note that `T_MOVED` is living, but `is_live_object()` returns false.
cc is callcache.
cc->klass (klass) should not be marked because if the klass is
free'ed, the cc->klass will be cleared by `vm_cc_invalidate()`.
cc->cme (cme) should not be marked because if cc is invalidated
when cme is free'ed.
- klass marks cme if klass uses cme.
- caller classe's ccs->cme marks cc->cme.
- if cc is invalidated (klass doesn't refer the cc),
cc is invalidated by `vm_cc_invalidate()` and cc->cme is
not be accessed.
- On the multi-Ractors, cme will be collected with global GC
so that it is safe if GC is not interleaving while accessing
cc and cme.
fix [Bug #19436]
```ruby
10_000.times{|i|
# p i if (i%1_000) == 0
str = "x" * 1_000_000
def str.foo = nil
eval "def call#{i}(s) = s.foo"
send "call#{i}", str
}
```
Without this patch:
```
real 1m5.639s
user 0m6.637s
sys 0m58.292s
```
and with this patch:
```
real 0m2.045s
user 0m1.627s
sys 0m0.164s
```
This both save time for when it will be eventually needed,
and avoid mutating heap pages after a potential fork.
Instrumenting some large Rails app, I've witnessed up to
58% of String instances having their coderange still unknown.
[Feature #18885]
For now, the optimizations performed are:
- Run a major GC
- Compact the heap
- Promote all surviving objects to oldgen
Other optimizations may follow.
Closes [Feature #19729]
Previously 2 bits of the flags on each RVALUE are reserved to store the
number of GC cycles that each object has survived. This commit
introduces a new bit array on the heap page, called age_bits, to store
that information instead.
This patch still reserves one of the age bits in the flags (the old
FL_PROMOTED0 bit, now renamed FL_PROMOTED).
This is set to 0 for young objects and 1 for old objects, and is used as
a performance optimisation for the write barrier. Fetching the age_bits
from the heap page and doing the required math to calculate if the
object was old or not would slow down the write barrier. So we keep this
bit synced in the flags for fast access.
According to the C99 specification section 7.20.3.2 paragraph 2:
> If ptr is a null pointer, no action occurs.
So we do not need to check that the pointer is a null pointer.
This reverts commit 10621f7cb9.
This was reverted because the gc integrity build started failing. We
have figured out a fix so I'm reopening the PR.
Original commit message:
Fix cvar caching when class is cloned
The class variable cache that was added in
ruby#4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
The class variable cache that was added in
https://github.com/ruby/ruby/pull/4544 changed the behavior of class
variables on cloned classes. As reported when a class is cloned AND a
class variable was set, and the class variable was read from the
original class, reading a class variable from the cloned class would
return the value from the original class.
This was happening because the IC (inline cache) is stored on the ISEQ
which is shared between the original and cloned class, therefore they
share the cache too.
To fix this we are now storing the `cref` in the cache so that we can
check if it's equal to the current `cref`. If it's different we don't
want to read from the cache. If it's the same we do. Cloned classes
don't share the same cref with their original class.
This will need to be backported to 3.1 in addition to 3.2 since the bug
exists in both versions.
We also added a marking function which was missing.
Fixes [Bug #19379]
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>