After upgrading GitHub to Ruby 3.4 we noticed that we stopped getting
useful C level backtrace information in our crash reports. We traced it
back to 7dd2afbe3a.
Passing 0 instead of -1 made sense for the Mach-O version of
`fill_lines`, but there is a separate ELF version of `fill_lines` that
still has special handling for -1: 58e3aa0224/addr2line.c (L2178-L2209)
Without this special handling for the main executable, we don't have the
right `base_addr` when reading debug info, and so we fail to populate
the information for that line: 58e3aa0224/addr2line.c (L1948)
Then we get to 58e3aa0224/addr2line.c (L2649),
and potentially (depending on how things were run) get back `"ruby"` as
`info.dli_fname` instead of the absolute path for the executable. We set
that as the `binary_filename` and then try to open it inside the next
call to `fill_lines`, but that fails (unless you happen to be in the
directory where the ruby executable lives) and break out of filling
lines entirely: 58e3aa0224/addr2line.c (L2673-L2674)
This commit treats offset 0 as the main executable, rather than having
a special meaning for -1 (which gets turned into 0 anyway).
[Bug #21289]
This sets the ivars _before_ calling initialize, which feels wrong. But
Data doesn't give us any mechanism for setting the members other than 1)
initialize, or 2) drop down into the C API. Since initialize freezes
the object, we need to set the ivars before that. I think this is a
reasonable compromise—if users need better handling, they can implement
their own `encode_with` and `init_with`. But it will lead to unhappy
surprises for some users.
Alternatively, we could use the C API, similarly to Marshal. Psych _is_
already using the C API for path2class and build_exception. This would
be the least surprising behavior for users, I think.
This fixes the issue where regular expression would come back slightly
different after going through a YAML load/dump cycle. Because we're used
to having to escape forward slashes in regular expression literals
(because the literal is delimited by slashes), but the deserializer
takes the literal output from `Regexp#inspect` and feeds it as a string
into `Regexp.new`, which expects a string, not a Regexp literal, cycling
did not properly work before this commit.
I've also changed the code to be a bit more readable, I hope this
doesn't affect performance.
f4dd8dadad
GCC 13.3.0 (Ubuntu 24.04) emits the following warning:
../symbol.c: In function ‘rb_id_attrset’:
../symbol.c:175:9: warning: ‘nonstring’ attribute ignored on objects of type ‘const char[][8]’ [-Wattributes]
175 | RBIMPL_ATTR_NONSTRING() static const char id_types[][8] = {
| ^~~~~~~~~~~~~~~~~~~~~
Use RefCell to allow path compression in union-find
When I wrote the original version I didn't understand the interior
mutability pattern, but now I do! With this commit, we should have a
more optimal union-find implementation.
When doing a coroutine transfer from one thread to another, there's a
risk that the compiler will reuse an address from TLS before the
transfer to the new thread.
These VM assertions are all in places we would not otherwise be reading
from TLS, but using the value of `ec` or `cr` passed in. Switching these
to test against rb_current_ec_noinline() instead ensures there isn't an
optimization applied to how we read ruby_current_ec.
Currently it seems we were hitting this on LLVM 18 specifically, but I
don't know of any reason other versions wouldn't have the same issue.
If the shape has only one child, we check it lock-free without
compromising thread safety.
I haven't computed hard data as to how often that it the case,
but we can assume that it's not too rare for shapes to have
a single child that is often requested, typically when freezing
and object.
These filenames are passed into test classes, and the tests we're trying
to exclude exist in TestObjectSpace in the Ruby repo, not TestObjSpace
195728dc8c
`c < 32 || c == 34` is equivalent to `c ^ 2 < 33`.
Found in: https://lemire.me/blog/2025/04/13/detect-control-characters-quotes-and-backslashes-efficiently-using-swar/
The gain seem mostly present on micro-benchmark, and even there aren't
very consistent, but it's never slower.
```
== Encoding long string (124001 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
after 5.295k i/100ms
Calculating -------------------------------------
after 55.796k (± 3.4%) i/s (17.92 μs/i) - 280.635k in 5.035690s
Comparison:
before: 49840.7 i/s
after: 55795.8 i/s - 1.12x faster
```
034c5debd8
Most of this code use the `type * name` style, while the
overwhemling majority of the rest of ruby use the `type *name`
style.
This is a cosmetic change, but helps with readability.
Tombstone removal may possibly require allocation, and we're not allowed
to allocate during GC. This commit also renames `set_compact` to
`set_update_references` to differentiate tombstone removal compaction with GC
object compaction.
Co-Authored-By: Max Bernstein <max.bernstein@shopify.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
* ZJIT: Disable ZJIT instructions when USE_ZJIT is 0
* Test the order of ZJIT instructions
* Add more jobs that disable JITs
* Show instruction names in the message