Commit graph

1116 commits

Author SHA1 Message Date
KJ Tsanaktsidis
5b6009870d Ensure fiber scheduler is woken up when close interrupts read
If one thread is reading and another closes that socket, the close
blocks waiting for the read to abort cleanly. This ensures that Ruby is
totally done with the file descriptor _BEFORE_ we tell the OS to close
and potentially re-use it.

When the read is correctly terminated, the close should be unblocked.
That currently works if closing is happening on a thread, but if it's
happening on a fiber with a fiber scheduler, it does NOT work.

This patch ensures that if the close happened in a fiber scheduled
thread, that the scheduler is notified that the fiber is unblocked.

[Bug #20723]
2024-09-23 09:25:10 -07:00
Takashi Kokubun
b13cf49036 merge revision(s) 055613fd868a8c94e43893f8c58a00cdd2a81f6d,127d7a35df10ee2bc99f44b888972b2c5361d84f,e2a9b87126d59e4766479a7aa12cf7a648f46506: [Backport #20447]
Fix pointer incompatiblity

	Since the subsecond part is discarded, WIDEVAL to VALUE conversion is
	needed.

	Some functions are not used when `THREAD_MODEL=none`

	`rb_thread_sched_destroy` is not used now at all
2024-05-30 11:13:15 -07:00
Takashi Kokubun
6e46a363a8 merge revision(s) a7ff264477: [Backport #20393]
Don't clear pending interrupts in the parent process. (#10365)
2024-05-29 11:46:33 -07:00
Takashi Kokubun
22c1e5f126 Suppress -Wclobbered warnings
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-05-29 10:57:43 -07:00
Takashi Kokubun
a8b2317d16 merge revision(s) 78d9fe6947: [Backport #20286]
Ensure that exiting thread invokes end-of-life behaviour. (#10039)
2024-05-29 10:02:15 -07:00
KJ Tsanaktsidis
b6c07acedb
Backport bug #20493 to Ruby 3.3 (#10798)
Inline RB_VM_SAVE_MACHINE_CONTEXT into BLOCKING_REGION

There's an exhaustive explanation of this in the linked redmine bug, but
the short version is as follows:

blocking_region_begin can spill callee-saved registers to the stack for
its own use. That means they're not saved to ec->machine by the call to
setjmp, since by that point they're already on the stack and new,
different values are in the real registers. ec->machine's end-of-stack
pointer is also bumped to accomodate this, BUT, after
blocking_region_begin returns, that points past the end of the stack!

As far as C is concerned, that's fine; the callee-saved registers are
restored when blocking_region_begin returns. But, if another thread
triggers GC, it is relying on finding references to Ruby objects by
walking the stack region pointed to by ec->machine.

If the C code in exec; subsequently does things that use that stack
memory, then the value will be overwritten and the GC might prematurely
collect something it shouldn't.

[Bug #20493]
2024-05-28 13:17:24 -07:00
Koichi Sasada
541371e286 accept RB_WAITFD_IN | RB_WAITFD_OUT for waiting events
Assrsion was `events == RB_WAITFD_IN || events == RB_WAITFD_OUT`
but it should accept `RB_WAITFD_IN | RB_WAITFD_OUT`.
2023-12-24 14:53:46 +09:00
Koichi Sasada
fa5de8f68d MN: skip waiting on fiber schedulers
If the Fiber is nonblocking mode, fiber scheduler needs to handle
IO events.
2023-12-23 08:10:41 +09:00
Koichi Sasada
bbfc262c99 MN: fix "raise on close"
Introduce `thread_io_wait_events()` to make 1 function to call
`thread_sched_wait_events()`.

In ``thread_io_wait_events()`, manipulate `waiting_fd` to raise
an exception when closing the IO correctly.
2023-12-23 05:56:02 +09:00
JP Camara
256f34ab6b Hand thread into thread_sched_wait_events_timeval
* When we have the thread already, it saves a lookup

* `event_wait`, not `kq`

Clean up the `thread_sched_wait_events_timeval` calls

* By handling the PTHREAD check inside the function, all the other code can become much simpler and just call the function directly without additional checks
2023-12-20 16:23:38 +09:00
JP Camara
8782e02138 KQueue support for M:N threads
* Allows macOS users to use M:N threads (and technically FreeBSD, though it has not been verified on FreeBSD)

* Include sys/event.h header check for macros, and include sys/event.h when present

* Rename epoll_fd to more generic kq_fd (Kernel event Queue) for use by both epoll and kqueue

* MAP_STACK is not available on macOS so conditionall apply it to mmap flags

* Set fd to close on exec

* Log debug messages specific to kqueue and epoll on creation

* close_invalidate raises an error for the kqueue fd on child process fork. It's unclear rn if that's a bug, or if it's kqueue specific behavior

Use kq with rb_thread_wait_for_single_fd

* Only platforms with `USE_POLL` (linux) had changes applied to take advantage of kernel event queues. It needed to be applied to the `select` so that kqueue could be properly applied

* Clean up kqueue specific code and make sure only flags that were actually set are removed (or an error is raised)

* Also handle kevent specific errnos, since most don't apply from epoll to kqueue

* Use the more platform standard close-on-exec approach of `fcntl` and `FD_CLOEXEC`. The io-event gem uses `ioctl`, but fcntl seems to be the recommended choice. It is also what Go, Bun, and Libuv use

* We're making changes in this file anyways - may as well fix a couple spelling mistakes while here

Make sure FD_CLOEXEC carries over in dup

* Otherwise the kqueue descriptor should have FD_CLOEXEC, but doesn't and fails in assert_close_on_exec
2023-12-20 16:23:38 +09:00
Koichi Sasada
2fe5fc176b setup waiting_fd for thread_sched_wait_events()
`thread_sched_wait_events()` suspend the thread until the target
fd is ready. Howver, other threads can close the target fd and
suspended thread should be awake. To support it, setup `waiting_fd`
before `thread_sched_wait_events()`.

`rb_thread_io_wake_pending_closer()` should be called before
`RUBY_VM_CHECK_INTS_BLOCKING()` because it can return this function.

This patch introduces additional overhead (setup/cleanup `waiting_fd`)
and maybe we can reduce the overhead.
2023-12-20 07:00:41 +09:00
KJ Tsanaktsidis
f8effa209a Change the semantics of rb_postponed_job_register
Our current implementation of rb_postponed_job_register suffers from
some safety issues that can lead to interpreter crashes (see bug #1991).
Essentially, the issue is that jobs can be called with the wrong
arguments.

We made two attempts to fix this whilst keeping the promised semantics,
but:
  * The first one involved masking/unmasking when flushing jobs, which
    was believed to be too expensive
  * The second one involved a lock-free, multi-producer, single-consumer
    ringbuffer, which was too complex

The critical insight behind this third solution is that essentially the
only user of these APIs are a) internal, or b) profiling gems.

For a), none of the usages actually require variable data; they will
work just fine with the preregistration interface.

For b), generally profiling gems only call a single callback with a
single piece of data (which is actually usually just zero) for the life
of the program. The ringbuffer is complex because it needs to support
multi-word inserts of job & data (which can't be atomic); but nobody
actually even needs that functionality, really.

So, this comit:
  * Introduces a pre-registration API for jobs, with a GVL-requiring
    rb_postponed_job_prereigster, which returns a handle which can be
    used with an async-signal-safe rb_postponed_job_trigger.
  * Deprecates rb_postponed_job_register (and re-implements it on top of
    the preregister function for compatability)
  * Moves all the internal usages of postponed job register
    pre-registration
2023-12-10 15:00:37 +09:00
Koichi Sasada
352a885a0f Thread specific storage APIs
This patch introduces thread specific storage APIs
for tools which use `rb_internal_thread_event_hook` APIs.

* `rb_internal_thread_specific_key_create()` to create a tool specific
  thread local storage key and allocate the storage if not available.
* `rb_internal_thread_specific_set()` sets a data to thread and tool
  specific storage.
* `rb_internal_thread_specific_get()` gets a data in thread and tool
  specific storage.

Note that `rb_internal_thread_specific_get|set(thread_val, key)`
can be called without GVL and safe for async signal and safe for
multi-threading (native threads). So you can call it in any internal
thread event hooks. Further more you can call it from other native
threads. Of course `thread_val` should be living while accessing the
data from this function.

Note that you should not forget to clean up the set data.
2023-12-08 13:16:19 +09:00
Jean Boussier
23a7714343 Refactor and fix the GVL instrumentation API
This entirely changes how it is tested. Rather than to use counters
we now record the timeline of events with associated threads which
makes it much easier to assert that certains events are only preceded
by a specific event, and makes it much easier to debug unexpected
timelines.

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
Co-Authored-By: JP Camara <jp@jpcamara.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
2023-11-27 17:37:57 +01:00
Jean Boussier
c8d99fa662 Embed ThreadGroup object
These are rare but embedding them is trivial and make them fit
in a 40B slot.
2023-11-22 18:42:04 +01:00
Jean Boussier
9ca41e9991 GVL Instrumentation: pass thread->self as part of event data
Context: https://github.com/ivoanjo/gvl-tracing/pull/4

Some hooks may want to collect data on a per thread basis.
Right now the only way to identify the concerned thread is to
use `rb_nativethread_self()` or similar, but even then because
of the thread cache or MaNy, two distinct Ruby threads may report
the same native thread id.

By passing `thread->self`, hooks can use it as a key to store
the metadata.

NB: Most hooks are executed outside the GVL, so such data collection
need to use a thread-safe data-structure, and shouldn't use the
reference in other ways from inside the hook.

They must also either pin that value or handle compaction.
2023-11-13 08:45:20 +01:00
Koichi Sasada
cdb36dfe7d fix native_thread_destroy() timing
With M:N thread scheduler, the native thread (NT) related resources
should be freed when the NT is no longer needed. So the calling
`native_thread_destroy()` at the end of `is will be freed when
`thread_cleanup_func()` (at the end of Ruby thread) is not correct
timing. Call it when the corresponding Ruby thread is collected.
2023-10-13 09:19:31 +09:00
Koichi Sasada
be1bbd5b7d M:N thread scheduler for Ractors
This patch introduce M:N thread scheduler for Ractor system.

In general, M:N thread scheduler employs N native threads (OS threads)
to manage M user-level threads (Ruby threads in this case).
On the Ruby interpreter, 1 native thread is provided for 1 Ractor
and all Ruby threads are managed by the native thread.

From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means
1 Ruby thread has 1 native thread. M:N scheduler change this strategy.

Because of compatibility issue (and stableness issue of the implementation)
main Ractor doesn't use M:N scheduler on default. On the other words,
threads on the main Ractor will be managed with 1:1 thread scheduler.

There are additional settings by environment variables:

`RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor.
Note that non-main ractors use the M:N scheduler without this
configuration. With this configuration, single ractor applications
run threads on M:1 thread scheduler (green threads, user-level threads).

`RUBY_MAX_CPU=n` specifies maximum number of native threads for
M:N scheduler (default: 8).

This patch will be reverted soon if non-easy issues are found.

[Bug #19842]
2023-10-12 14:47:01 +09:00
Matthew Draper
aed5215104 Optimize handle_interrupt(Exception => ..) as a common case
When interrupt behavior is configured for all possible exceptions using
'Exception', there's no need to iterate the pending exception's
ancestors for hash lookups.

More significantly, by storing the catch-all timing symbol directly in
the mask stack, we can skip allocating the hash we would otherwise need.
2023-09-07 13:51:15 -07:00
Matthew Draper
ed712e0e9d Skip allocation if handle_interrupt arg is already usable
If the supplied hash is already frozen and compare-by-identity, we can
use it directly (still checking its contents are valid symbols), without
making a new copy.
2023-09-07 13:51:15 -07:00
Peter Zhu
3223181284 Remove RARRAY_CONST_PTR_TRANSIENT
RARRAY_CONST_PTR now does the same things as RARRAY_CONST_PTR_TRANSIENT.
2023-07-13 14:48:14 -04:00
Nobuyoshi Nakada
c1432a4816
Compile disabled code for thread cache always 2023-06-30 23:59:05 +09:00
Peter Zhu
58386814a7 Don't check for null pointer in calls to free
According to the C99 specification section 7.20.3.2 paragraph 2:

> If ptr is a null pointer, no action occurs.

So we do not need to check that the pointer is a null pointer.
2023-06-30 09:13:31 -04:00
Samuel Williams
0402193723
Fix Thread#join(timeout) when running inside the fiber scheduler. (#7903) 2023-06-03 12:41:36 +09:00
KJ Tsanaktsidis
edee9b6a12
Use a real Ruby mutex in rb_io_close_wait_list (#7884)
Because a thread calling IO#close now blocks in a native condvar wait,
it's possible for there to be _no_ threads left to actually handle
incoming signals/ubf calls/etc.

This manifested as failing tests on Solaris 10 (SPARC), because:

* One thread called IO#close, which sent a SIGVTALRM to the other
  thread to interrupt it, and then waited on the condvar to be notified
  that the reading thread was done.
* One thread was calling IO#read, but it hadn't yet reached the actual
  call to select(2) when the SIGVTALRM arrived, so it never unblocked
  itself.

This results in a deadlock.

The fix is to use a real Ruby mutex for the close lock; that way, the
closing thread goes into sigwait-sleep and can keep trying to interrupt
the select(2) thread.

See the discussion in: https://github.com/ruby/ruby/pull/7865/
2023-06-01 17:37:18 +09:00
git
cc698c6cc2 * expand tabs. [ci skip]
Please consider using misc/expand_tabs.rb as a pre-commit hook.
2023-05-26 05:51:35 +00:00
KJ Tsanaktsidis
66871c5a06 Fix busy-loop when waiting for file descriptors to close
When one thread is closing a file descriptor whilst another thread is
concurrently reading it, we need to wait for the reading thread to be
done with it to prevent a potential EBADF (or, worse, file descriptor
reuse).

At the moment, that is done by keeping a list of threads still using the
file descriptor in io_close_fptr. It then continually calls
rb_thread_schedule() in fptr_finalize_flush until said list is empty.

That busy-looping seems to behave rather poorly on some OS's,
particulary FreeBSD. It can cause the TestIO#test_race_gets_and_close
test to fail (even with its very long 200 second timeout) because the
closing thread starves out the using thread.

To fix that, I introduce the concept of struct rb_io_close_wait_list; a
list of threads still using a file descriptor that we want to close. We
call `rb_notify_fd_close` to let the thread scheduler know we're closing
a FD, which fills the list with threads. Then, we call
rb_notify_fd_close_wait which will block the thread until all of the
still-using threads are done.

This is implemented with a condition variable sleep, so no busy-looping
is required.
2023-05-26 14:51:23 +09:00
KJ Tsanaktsidis
8e1abef469 Fix a potential busy-loop in the thread scheduler (esp. on FreeBSD)
This patch fixes a potential busy-loop in the thread scheduler. If there
are two threads, the main thread (where Ruby signal handlers must run)
and a sleeping thread, it is possible for the following sequence of
events to occur:

* The sleeping thread is in native_sleep -> sigwait_sleep A signal
* arives, kicking this thread out of rb_sigwait_sleep The sleeping
* thread calls THREAD_BLOCKING_END and eventually
  thread_sched_to_running_common
* the sleeping thread writes into the sigwait_fd pipe by calling
  rb_thread_wakeup_timer_thread
* the sleeping thread re-loops around in native_sleep() because
  the desired sleep time has not actually yet expired
* that calls rb_sigwait_sleep again the ppoll() in rb_sigwait_sleep
* immediately returns because
  of the byte written into the sigwait_fd by
rb_thread_wakeup_timer_thread
* that wakes the thread up again and kicks the whole cycle off again.

Such a loop can only be broken by the main thread waking up and handling
the signal, such that ubf_threads_empty() below becomes true again;
however this loop can actually keep things so busy (and cause so much
contention on the main thread's interrupt_lock) that the main thread
doesn't deal with the signal for many seconds. This seems particuarly
likely on FreeBSD 13.

(the cycle can also be broken by the sleeping thread finally elapsing
its desired sleep time).

The fix for _this_ loop is to only wakeup the timer thrad in
thread_sched_to_running_common if the current thread is not itself the
sigwait thread.

An almost identical loop also happens in the same circumstances because
the call to check_signals_nogvl (through sigwait_timeout) in
rb_sigwait_sleep returns true if there is any pending signal for the
main thread to handle. That then causes rb_sigwait_sleep to skip over
sleeping entirely.

This is unnescessary and counterproductive, I believe; if the main
thread needs to be woken up that is done inline in check_signals_nogvl
anyway.

See https://bugs.ruby-lang.org/issues/19680
2023-05-26 14:48:08 +09:00
Samuel Williams
2df5a697e2
Add Fiber#kill, similar to Thread#kill. (#7823) 2023-05-18 23:33:42 +09:00
Samuel Williams
ab7bb38aca
Remove explicit SIGCHLD handling. (#7816)
* Remove unused SIGCHLD handling.

* Remove unused `init_sigchld`.

* Remove unnecessary `#define RUBY_SIGCHLD (0)`.

* Remove unused `SIGCHLD_LOSSY`.
2023-05-15 23:14:51 +09:00
Koichi Sasada
b2e848193a fix deadlock on Thread#join
because of 9720f5ac89

20230403T130011Z.fail.html.gz

```
  1) Failure:
TestThread#test_signal_at_join [/export/home/chkbuild/chkbuild-sunc/tmp/build/20230403T130011Z/ruby/test/ruby/test_thread.rb:1488]:
Exception raised:
<#<fatal:"No live threads left. Deadlock?\n1 threads, 1 sleeps current:0x00891288 main thread:0x00891288\n* #<Thread:0xfef89a18 sleep_forever>\n   rb_thread_t:0x00891288 native:0x00000001 int:0\n   \n">>
Backtrace:
  -:30:in `join'
  -:30:in `block (3 levels) in <main>'
  -:21:in `times'
  -:21:in `block (2 levels) in <main>'.
```

The mechanism:

* Main thread (M) calls `Thread#join`
* M: calls `sleep_forever()`
* M: set `th->status = THREAD_STOPPED_FOREVER`
* M: do `checkints`
* M: handle a trap handler with `th->status = THREAD_RUNNABLE`
* M: thread switch at the end of the trap handler
* Another thread (T) will process `Thread#kill` by M.
* T: `rb_threadptr_join_list_wakeup()` at the end of T tris to wakeup M,
     but M's state is runnable because M is handling trap handler and
     just ignore the waking up and terminate T$a
* T: switch to M.
* M: after the trap handler, reset `th->status = THREAD_STOPPED_FOREVER`
     and check deadlock -> Deadlock because only M is living.

To avoid such situation, add new sleep flags `SLEEP_ALLOW_SPURIOUS`
and `SLEEP_NO_CHECKINTS` to skip any check ints.

BTW this is instentional to leave second `vm_check_ints_blocking()`
without checking `SLEEP_NO_CHECKINTS` because `SLEEP_ALLOW_SPURIOUS`
should be specified with `SLEEP_NO_CHECKINTS` and skipping this
checkints can skip any interrupts.
2023-04-04 07:57:51 +09:00
Koichi Sasada
9720f5ac89 use sleep_forever() on thread_join_sleep()
because it does same thing.
2023-04-01 09:48:37 +09:00
Koichi Sasada
1d19776c7f cosmetic change
reorder `sleep_forever()` and so on.
2023-03-31 19:26:47 +09:00
Koichi Sasada
f803bcfc87 pass th to thread_sched_to_waiting()
for future extension
2023-03-31 18:50:10 +09:00
Koichi Sasada
4c0f82eb5b remove "\n" for RUBY_DEBUG_LOG()
because `RUBY_DEBUG_LOG()` add "\n" at the end of message.
2023-03-31 18:15:04 +09:00
Koichi Sasada
30b43f4f1a rb_ractor_thread_list() only for current ractor
so that no need to lock the ractor.
2023-03-30 14:56:37 +09:00
Koichi Sasada
ba72849a3f cosmetic change 2023-03-30 14:56:10 +09:00
Matt Valentine-House
60b8c7d9fd Rename RB_GC_SAVE_MACHINE_CONTEXT -> RB_VM_SAVE_MACHINE_CONTEXT 2023-03-15 21:26:26 +00:00
Samuel Williams
7fd53eeb46
Remove SIGCHLD waidpid. (#7527)
* Remove `waitpid_lock` and related code.

* Remove un-necessary test.

* Remove `rb_thread_sleep_interruptible` dead code.
2023-03-15 19:48:27 +13:00
Samuel Williams
ac65ce16e9
Revert SIGCHLD changes to diagnose CI failures. (#7517)
* Revert "Remove special handling of `SIGCHLD`. (#7482)"

This reverts commit 44a0711eab.

* Revert "Remove prototypes for functions that are no longer used. (#7497)"

This reverts commit 4dce12bead.

* Revert "Remove SIGCHLD `waidpid`. (#7476)"

This reverts commit 1658e7d966.

* Fix change to rjit variable name.
2023-03-14 20:07:59 +13:00
Samuel Williams
1658e7d966
Remove SIGCHLD waidpid. (#7476)
* Remove `waitpid_lock` and related code.

* Remove un-necessary test.

* Remove `rb_thread_sleep_interruptible` dead code.
2023-03-09 16:05:47 +13:00
Takashi Kokubun
9ad19069f9 Remove obsoleted functions in rjit.c 2023-03-07 23:59:50 -08:00
Takashi Kokubun
b67f07fa2c Get rid of MJIT's special fork 2023-03-07 23:08:57 -08:00
Takashi Kokubun
23ec248e48 s/mjit/rjit/ 2023-03-06 23:44:01 -08:00
Takashi Kokubun
2e875549a9 s/MJIT/RJIT/ 2023-03-06 23:44:01 -08:00
Samuel Williams
2c4b2053ca
Correctly clean up keeping_mutexes before resuming any other threads. (#7460)
It's possible (but very rare) to have a race condition between setting
`mutex->fiber = NULL` and `thread_mutex_remove(th, mutex)` which results
in the following bug:

```
[BUG] invalid keeping_mutexes: Attempt to unlock a mutex which is not locked
```

Fixes <https://bugs.ruby-lang.org/issues/19480>.
2023-03-07 20:23:00 +13:00
Takashi Kokubun
233ddfac54 Stop exporting symbols for MJIT 2023-03-06 21:59:23 -08:00
Jean Boussier
704dd25812 TestThreadInstrumentation: emit the EXIT event sooner
```
  1) Failure:
TestThreadInstrumentation#test_thread_instrumentation [/tmp/ruby/src/trunk-repeat20-asserts/test/-ext-/thread/test_instrumentation_api.rb:33]:
Call counters[4]: [3, 4, 4, 4, 0].
Expected 0 to be > 0.
```

We fire the EXIT hook after the call to `thread_sched_to_dead` which
mean another thread might be running before the `EXIT` hook have been
executed.
2023-03-06 13:10:42 +01:00
Matt Valentine-House
72aba64fff Merge gc.h and internal/gc.h
[Feature #19425]
2023-02-09 10:32:29 -05:00