After upgrading GitHub to Ruby 3.4 we noticed that we stopped getting
useful C level backtrace information in our crash reports. We traced it
back to 7dd2afbe3a.
Passing 0 instead of -1 made sense for the Mach-O version of
`fill_lines`, but there is a separate ELF version of `fill_lines` that
still has special handling for -1: 58e3aa0224/addr2line.c (L2178-L2209)
Without this special handling for the main executable, we don't have the
right `base_addr` when reading debug info, and so we fail to populate
the information for that line: 58e3aa0224/addr2line.c (L1948)
Then we get to 58e3aa0224/addr2line.c (L2649),
and potentially (depending on how things were run) get back `"ruby"` as
`info.dli_fname` instead of the absolute path for the executable. We set
that as the `binary_filename` and then try to open it inside the next
call to `fill_lines`, but that fails (unless you happen to be in the
directory where the ruby executable lives) and break out of filling
lines entirely: 58e3aa0224/addr2line.c (L2673-L2674)
This commit treats offset 0 as the main executable, rather than having
a special meaning for -1 (which gets turned into 0 anyway).
[Bug #21289]
fill_lines is passed -1 for offset, which causes it to read the -1 index
of traces. This is not valid memory as -1 is reading before the trace
global variable in rb_print_backtrace. This code comes from commit
99d1f5f88b, where there used to be special
handling for the -1 index.
We can see this error in ASAN:
==71037==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00010157abf8 at pc 0x00010116f3b8 bp 0x00016f92c3b0 sp 0x00016f92c3a8
READ of size 8 at 0x00010157abf8 thread T0
#0 0x10116f3b4 in debug_info_read addr2line.c:1945
#1 0x10116cc90 in fill_lines addr2line.c:2497
#2 0x101169dbc in rb_dump_backtrace_with_lines addr2line.c:2635
#3 0x100e56788 in rb_print_backtrace vm_dump.c:825
#4 0x100e56db4 in rb_vm_bugreport vm_dump.c:1155
#5 0x100734dc4 in rb_bug_without_die error.c:1085
#6 0x100734ae4 in rb_bug error.c:109
macOS clang 16 generates DWARF5, which have Mach-O section names that
are limited to 16 characters, which causes sections with long names to
be truncated and not match above.
See: https://wiki.dwarfstd.org/Best_Practices.md#Mach-2d-O
addr2line.c: fix DW_FORM_ref_addr parsing for DWARF 2
This fixes a crash when retrieving backtrace info with YJIT enabled on
macOS with Rust 1.71.0. Since Rust 1.71.0, the DWARF info generated by
the Rust compiler uses DW_FORM_ref_addr instead of DW_FORM_ref4 for
pointers to other DIEs.
DW_FORM_ref_addr representation in DWARF 2 is different from DWARF 3+,
so we need to handle it separately.
This patch fixes the parsing of DW_FORM_ref_addr for DWARF 2, which is
the default DWARF version Rustc uses on macOS.
See the DWARF 2.0.0 spec, section 7.5.4 Attribute Encodings
https://dwarfstd.org/doc/dwarf-2.0.0.pdfhttps://bugs.ruby-lang.org/issues/19789
DW_FORM_GNU_ref_alt and DW_FORM_GNU_strp_alt refer to data stored in an
external ELF file specified by a .gnu_debugaltlink attribute. These
attributes are generated by dwz(1), which extracts DWARF data common
amongst several files and stores it in a single, new file. It leaves
behind these two forms in the original file to point at the new, common
data.
We don't support actually reading the .gnu_debugaltlink file in
addr2line.c (and maybe we don't really need to), but we do need to know
how to read the actual value of these forms so we can skip over the
right number of bytes and not lose track of where we are in the CU.
While trying to fix YJIT's symbol hygiene issue over at GH-7115, I found
that addr2line.c's DWARF 5 parsing is half-disabled when building with
GCC. Rust's output contains some DW_AT_rnglists_base records, which the
disabled code reads. Without DW_AT_rnglists_base, it crashes when
generating a backtrace.
In common Ruby build configurations, GCC opts to only use
DW_FORM_sec_offset for the range lists, and so it doesn't generate
DW_AT_rnglists_base records, so consuming GCC's DWARF 5 while building
with GCC was not a problem.
However, even when building with GCC, we might need to parse DWARF 5
generated by other compilers at runtime. They could come from C
extensions built by Clang, or come from Rust extensions. This
can happen even when building without YJIT.
We need to manually strip pointer authentication bits on M1 mac because
libunwind leaks them out.
Co-Authored-By: NARUSE, Yui <naruse@airemix.jp>
Co-Authored-By: Yuta Saito <kateinoigakukun@gmail.com>
Background: GCC 12 generates DWARF 5 with .debug_rnglists, while rustc
generates DWARF 4 with .debug_ranges.
The previous logic always used .debug_rnglists if there is the section.
However, we need to refer .debug_ranges for DWARF 4.
This change keeps DWARF version of the current compilation unit and use
a proper section depending on the version.
Currently, addr2line.c supports only one path format of debuglink:
"/usr/lib/debug/usr/bin/ruby.debug".
However, recent debian packages seem to use another format by build_id:
"/usr/lib/debug/.build-id/ab/cdef1234.debug".
5d1bb29841/dh_strip (L292)5d1bb29841/dh_strip (L353)
This changeset makes ruby backtrace support the second format.
When ruby is compiled by GCC 8 or later, some frames of C level
backtrace information lacks.
```
$ ./miniruby -e '1.times { Process.kill(:SEGV, $$) }'
...
-- C level backtrace information
-------------------------------------------
/home/mame/work/ruby-gcc-9/miniruby(rb_vm_bugreport+0x611) [0x558a5fdcbc21] ../ruby/vm_dump.c:758
[0x558a5fbc789a]
/home/mame/work/ruby-gcc-9/miniruby(sigsegv+0x4d) [0x558a5fd1eaed] ../ruby/signal.c:959
/lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7f687e6713c0]
/lib/x86_64-linux-gnu/libc.so.6(kill+0xb) [0x7f687e31355b] ../sysdeps/unix/syscall-template.S:78
/home/mame/work/ruby-gcc-9/miniruby(rb_f_kill+0x350) [0x558a5fd1fe60] ../ruby/signal.c:480
[0x558a5fda50d3]
[0x558a5fdb085c]
[0x558a5fdb0fe7]
[0x558a5fdbae1a]
[0x558a5fdaf484]
/home/mame/work/ruby-gcc-9/miniruby(rb_yield_1+0x29f) [0x558a5fdb2fbf] ../ruby/vm.c:1265
/home/mame/work/ruby-gcc-9/miniruby(int_dotimes+0x5c) [0x558a5fc72f2c] ../ruby/numeric.c:5198
[0x558a5fda50d3]
[0x558a5fdb085c]
[0x558a5fdb0fe7]
[0x558a5fdbaf21]
[0x558a5fdaf484]
/home/mame/work/ruby-gcc-9/miniruby(rb_ec_exec_node+0xed) [0x558a5fbcc4fd] ../ruby/eval.c:317
/home/mame/work/ruby-gcc-9/miniruby(ruby_run_node+0x4f) [0x558a5fbd110f] ../ruby/eval.c:375
/home/mame/work/ruby-gcc-9/miniruby(main+0x73) [0x558a5fb2c083] ../ruby/main.c:50
```
By this one-line change, it shows all locations.
```
$ ./miniruby -e '1.times { Process.kill(:SEGV, $$) }'
...
-- C level backtrace information -------------------------------------------
/home/mame/work/ruby-gcc-9/miniruby(rb_print_backtrace+0x11) [0x558247adec21] ../ruby/vm_dump.c:758
/home/mame/work/ruby-gcc-9/miniruby(rb_vm_bugreport) ../ruby/vm_dump.c:956
/home/mame/work/ruby-gcc-9/miniruby(rb_bug_for_fatal_signal+0x15a) [0x5582478da89a] ../ruby/error.c:773
/home/mame/work/ruby-gcc-9/miniruby(sigsegv+0x4d) [0x558247a31aed] ../ruby/signal.c:959
/lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7f82202f73c0]
/lib/x86_64-linux-gnu/libc.so.6(kill+0xb) [0x7f821ff9955b] ../sysdeps/unix/syscall-template.S:78
/home/mame/work/ruby-gcc-9/miniruby(rb_f_kill+0x350) [0x558247a32e60] ../ruby/signal.c:480
/home/mame/work/ruby-gcc-9/miniruby(vm_call_cfunc_with_frame+0x123) [0x558247ab80d3] ../ruby/vm_insnhelper.c:2821
/home/mame/work/ruby-gcc-9/miniruby(vm_call_method_each_type+0x7c) [0x558247ac385c] ../ruby/vm_insnhelper.c:3324
/home/mame/work/ruby-gcc-9/miniruby(vm_call_method+0xc7) [0x558247ac3fe7] ../ruby/vm_insnhelper.c:3428
/home/mame/work/ruby-gcc-9/miniruby(vm_sendish+0x14) [0x558247acde1a] ../ruby/vm_insnhelper.c:4412
/home/mame/work/ruby-gcc-9/miniruby(vm_exec_core) ../ruby/insns.def:789
/home/mame/work/ruby-gcc-9/miniruby(rb_vm_exec+0x1a4) [0x558247ac2484] ../ruby/vm.c:2165
/home/mame/work/ruby-gcc-9/miniruby(rb_yield_1+0x29f) [0x558247ac5fbf] ../ruby/vm.c:1265
/home/mame/work/ruby-gcc-9/miniruby(int_dotimes+0x5c) [0x558247985f2c] ../ruby/numeric.c:5198
/home/mame/work/ruby-gcc-9/miniruby(vm_call_cfunc_with_frame+0x123) [0x558247ab80d3] ../ruby/vm_insnhelper.c:2821
/home/mame/work/ruby-gcc-9/miniruby(vm_call_method_each_type+0x7c) [0x558247ac385c] ../ruby/vm_insnhelper.c:3324
/home/mame/work/ruby-gcc-9/miniruby(vm_call_method+0xc7) [0x558247ac3fe7] ../ruby/vm_insnhelper.c:3428
/home/mame/work/ruby-gcc-9/miniruby(vm_sendish+0x14) [0x558247acdf21] ../ruby/vm_insnhelper.c:4412
/home/mame/work/ruby-gcc-9/miniruby(vm_exec_core) ../ruby/insns.def:770
/home/mame/work/ruby-gcc-9/miniruby(rb_vm_exec+0x1a4) [0x558247ac2484] ../ruby/vm.c:2165
/home/mame/work/ruby-gcc-9/miniruby(rb_ec_exec_node+0xed) [0x5582478df4fd] ../ruby/eval.c:317
/home/mame/work/ruby-gcc-9/miniruby(ruby_run_node+0x4f) [0x5582478e410f] ../ruby/eval.c:375
/home/mame/work/ruby-gcc-9/miniruby(main+0x73) [0x55824783f083] ../ruby/main.c:50
```
Details:
In short, it is an uninitialized variable bug.
Until GCC 7, all function locations are represented by a pair of
DW_AT_low_pc and DW_AT_high_pc in DWARF information.
But since GCC 8, some functions are split to multiple chunks, which are
represented by DW_AT_ranges.
DW_AT_ranges are represented as offsets from a base address.
According to DWARF specification, it is the base address of the
compilation unit, but GCC seems to use zero as default.
The function "di_read_cu" in addr2line.c had a comment about the fact.
However, the base address wasn't initialized as zero.