We saw the following failure:
```
TestThreadInstrumentation#test_thread_instrumentation [/tmp/ruby/v3/src/trunk-random3/test/-ext-/thread/test_instrumentation_api.rb:25]:
Expected 0..3 to include 4.
```
Which shouldn't happen unless somehow there was a leaked thread.
Recently `TestThreadInstrumentation#test_join_counters` often fails as
```
<[1, 1, 1]> expected but was
<[2, 2, 2]>.
```
Probably it seems that the thread is suspended more than once.
There may be no guarantee that the subject thread never be suspended
more than once.
[Bug #18900]
Thread#join and a few other codepaths are using native sleep as
a way to suspend the current thread. So we should call the relevant
hook when this happen, otherwise some thread may transition
directly from `RESUMED` to `READY`.
I suspect that sometimes on CI the last thread is prempted before eaching the exit hook
causing the test to flake. I can't find a good way to force it to run.
It previously failed with:
```
1) Failure:
TestThreadInstrumentation#test_thread_instrumentation_fork_safe [/home/runner/work/ruby/ruby/src/test/-ext-/thread/test_instrumentation_api.rb:50]:
<5> expected but was
<4>.
```
Suggesting one `EXIT` event wasn't fired or processed.
Adding an assetion on `Thead#status` may help figure out what is wrong.
[Feature #18339]
After experimenting with the initial version of the API I figured there is a need
for an exit event to cleanup instrumentation data. e.g. if you record data in a
{thread_id -> data} table, you need to free associated data when a thread goes away.
`test_thread_instrumentation_fork_safe` has been failing occasionaly
on Ubuntu and Arch. At this stage we're not sure why, all we know is
that the child exit with status 1.
I suspect that something entirely unrelated cause the forked children
to fail on exit, so by using `exit!(0)` and doing assertions in the
parent I hope to be resilient to that.
The tests fail randomly on some platforms.
20220605T213004Z.fail.html.gz
20220605T210003Z.fail.html.gz
```
[15737/21701] TestThreadInstrumentation#test_thread_instrumentation_fork_safe/home/chkbuild/chkbuild/tmp/build/20220605T213004Z/ruby/tool/lib/test/unit/assertions.rb:109:in `assert': Expected 0 to be nonzero?. (Test::Unit::AssertionFailedError)
```
The failures seem to depend on context switches. I suspect `sleep 0.05`
is too short, so this change tries to extend the wait time.
Ref: https://bugs.ruby-lang.org/issues/18339
Design:
- This tries to minimize the overhead when no hook is registered.
It should only incur an extra unsynchronized boolean check.
- The hook list is protected with a read-write lock as to cause
contention when some hooks are registered.
- The hooks MUST be thread safe, and MUST NOT call into Ruby as they
are executed outside the GVL.
- It's simply a noop on Windows.
API:
```
rb_internal_thread_event_hook_t * rb_internal_thread_add_event_hook(rb_internal_thread_event_callback callback, rb_event_flag_t internal_event, void *user_data);
bool rb_internal_thread_remove_event_hook(rb_internal_thread_event_hook_t * hook);
```
You can subscribe to 3 events:
- READY: called right before attempting to acquire the GVL
- RESUMED: called right after successfully acquiring the GVL
- SUSPENDED: called right after releasing the GVL.
The hooks MUST be threadsafe, as they are executed outside of the GVL, they also MUST NOT call any Ruby API.