Commit graph

1971 commits

Author SHA1 Message Date
BurdetteLamar
c2c0c220a8 [DOC] Tweaks for String#casecmp 2025-07-07 15:10:17 -04:00
Jean Boussier
0bb44f291e Rename ractor_safe_set into concurrent_set
There's nothing ractor related in them, and the classic terminology
for these sort of data structures is `concurrent-*`, e.g.
concurrent hash.
2025-07-07 15:12:39 +02:00
BurdetteLamar
0604d0c9db [DOC] Tweaks for String#capitalize! 2025-07-07 09:03:02 -04:00
Burdette Lamar
987b5bf972
[DOC] Tweaks for String#capitalize 2025-07-07 09:02:15 -04:00
Burdette Lamar
350df4fbd9
[DOC] Tweaks for Case Mapping doc 2025-07-04 11:42:29 -04:00
Burdette Lamar
35feaee917
[DOC] Tweaks for String#bytesplice 2025-06-30 14:13:09 -04:00
BurdetteLamar
456f6f3f83 [DOC] Tweaks for Strings#byteslice 2025-06-30 13:59:25 -04:00
Peter Zhu
d9b2d89976 Extract Ractor safe table used for frozen strings
This commit extracts the Ractor safe table used for frozen strings into
ractor_safe_table.c, which will allow it to be used elsewhere, including
for the global symbol table.
2025-06-27 09:23:14 -04:00
Luke Gruber
328e3029d8 Get String#crypt working with multi-ractor in cases where !HAVE_CRYPT_R
In commit 12f7ba5ed4, ractor safety was added to String#crypt, however
in certain cases it can cause a deadlock. When we lock a native mutex,
we cannot allocate ruby objects because they might trigger GC which
starts a VM barrier. If the barrier is triggered and other native threads
are waiting on this mutex, they will not be able to be woken up in order to join
the barrier. To fix this, we don't allocate ruby objects when we hold the
lock.

The following could reproduce the problem:

```ruby
strings = []
10_000.times do |i|
  strings << "my string #{i}"
end

STRINGS = Ractor.make_shareable(strings)

rs = []
100.times do
  rs << Ractor.new do
    STRINGS.each do |s|
      s.dup.crypt(s.dup)
    end
  end
end
while rs.any?
  r, obj = Ractor.select(*rs)
  rs.delete(r)
end
```

I will not be adding tests because I am almost finished a PR to enable
running test-all test cases inside many ractors at once, which is how I
found the issue.

Co-authored-by: jhawthorn <john@hawthorn.email>
2025-06-25 14:11:08 -07:00
Peter Zhu
aed7a95f9d Move RUBY_ATOMIC_VALUE_LOAD to ruby_atomic.h
Deduplicates RUBY_ATOMIC_VALUE_LOAD by moving it to ruby_atomic.h.
2025-06-25 13:04:25 -04:00
Burdette Lamar
ec071c849f
[DOC] Tweaks for String#byterindex (#13485) 2025-06-25 10:51:45 -04:00
Jean Boussier
45a2c95d0f Reduce exposure of FL_FREEZE
The `FL_FREEZE` flag is redundant with `SHAPE_ID_FL_FROZEN`, so
ideally it should be eliminated in favor of the later.

Doing so would eliminate the risk of desync between the two, but
also solve the problem of the frozen status being global in namespace
context (See Bug #21330).
2025-06-24 11:29:39 +01:00
Benoit Daloze
83fb07fb2c [Bug #20998] Check if the string is frozen in rb_str_locktmp() & rb_str_unlocktmp() 2025-06-16 22:59:10 +02:00
Jean Boussier
15084fbc3c Get rid of FL_EXIVAR
Now that the shape_id gives us all the same information, it's no
longer needed.
2025-06-13 23:50:30 +02:00
Jean Boussier
6dbe24fe56 Use the shape_id rather than FL_EXIVAR
We still keep setting `FL_EXIVAR` so that `rb_shape_verify_consistency`
can detect discrepancies.
2025-06-13 23:50:30 +02:00
Jean Boussier
a99d941cac Add SHAPE_ID_HAS_IVAR_MASK for quick ivar check
This allow checking if an object has ivars with just a shape_id
mask.
2025-06-13 19:46:29 +02:00
Nobuyoshi Nakada
fa85d23ff4
[Bug #21380] Prohibit modification in String#split block
Reported at https://hackerone.com/reports/3163876
2025-05-29 11:10:58 +09:00
Jean Boussier
925dec8d70 Rename rb_shape_set_shape_id in rb_obj_set_shape_id 2025-05-27 15:34:02 +02:00
BurdetteLamar
909a0daab6 [DOC] More tweaks for String#byteindex 2025-05-26 13:42:35 -04:00
John Hawthorn
f483befd90 Add shape_id to RBasic under 32 bit
This makes `RBobject` `4B` larger on 32 bit systems
but simplifies the implementation a lot.

[Feature #21353]

Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2025-05-26 10:31:54 +02:00
Nobuyoshi Nakada
aad9fa2853
Use RB_VM_LOCKING 2025-05-25 15:22:43 +09:00
BurdetteLamar
3403055d13 [DOC] Tweaks for String#byteindex 2025-05-22 10:17:46 -04:00
Burdette Lamar
cc90adb68d
[DOC] Tweaks for String#append_as_bytes 2025-05-16 12:50:55 -04:00
BurdetteLamar
a188249616 [DOC] Tweaks for String#b 2025-05-16 12:47:17 -04:00
BurdetteLamar
1f09c9fa14 [DOC] Tweaks for String#ascii_only? 2025-05-16 12:46:56 -04:00
Burdette Lamar
4fc5047af8
[DOC] Tweaks for String#=~ (#13325) 2025-05-15 11:18:49 -04:00
Burdette Lamar
7afee53fa0
[DOC] Tweaks for String#<< (#13306) 2025-05-14 15:24:30 -04:00
Burdette Lamar
10e8119cff
[DOC] Tweaks for String#== (#13323) 2025-05-14 15:24:19 -04:00
Burdette Lamar
b00a339603
[DOC] Tweaks for String#[] (#13335) 2025-05-14 14:34:09 -04:00
BurdetteLamar
1f72512b03 [DOC] Tweaks for String#[]= 2025-05-14 14:33:40 -04:00
BurdetteLamar
96b823a211 [DOC] Tweaks for String#<=> 2025-05-13 13:14:25 -04:00
Nobuyoshi Nakada
64944cf422
[DOC] Remove a garbage 2025-05-13 00:07:56 +09:00
Burdette Lamar
bc6d48bd34
[DOC] Tweak for String#+@ (#13285) 2025-05-12 10:16:37 -04:00
BurdetteLamar
7a660d7c69 [DOC] Tweaks for What's Here 2025-05-08 16:34:33 -04:00
Burdette Lamar
46a8240884
[DOC] Tweaks for String#-@ 2025-05-08 10:31:47 -04:00
Jean Boussier
f48e45d1e9 Move object_id in object fields.
And get rid of the `obj_to_id_tbl`

It's no longer needed, the `object_id` is now stored inline
in the object alongside instance variables.

We still need the inverse table in case `_id2ref` is invoked, but
we lazily build it by walking the heap if that happens.

The `object_id` concern is also no longer a GC implementation
concern, but a generic implementation.

Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
2025-05-08 07:58:05 +02:00
BurdetteLamar
35918df740 [DOC] Tweaks for String#+ 2025-05-04 17:14:44 -04:00
BurdetteLamar
d2de59798c [DOC] Tweaks for String#* 2025-05-04 17:14:17 -04:00
BurdetteLamar
d71e171464 [DOC] Tweaks for String#% 2025-05-04 17:13:50 -04:00
Burdette Lamar
79fe8aa010
[DOC] Tweaks for String.new 2025-05-01 10:51:22 -04:00
Nobuyoshi Nakada
b42afa1dbc
Suppress gcc 15 unterminated-string-initialization warnings 2025-04-30 20:04:10 +09:00
Jean Boussier
1f090403e2 Fix comparison of signed and unsigned integers
```
../string.c:660:38: warning: comparison of integers of different signs: 'rb_atomic_t' (aka 'unsigned int') and 'int' [-Wsign-compare]
  660 |             RUBY_ASSERT(table->count < table->capacity / 2);
```
2025-04-23 18:35:00 +02:00
Nobuyoshi Nakada
c218862d3c
Fix style [ci skip] 2025-04-19 22:02:10 +09:00
Jean Boussier
0f25886fac Implement dsize function for fstring_table_type
The fstring table size used to be reported as part of the VM
size, but since it was refactored to be lock-less it was no
longer reported.

Since it's now wrapped by a `T_DATA`, we can implement its
`dsize` function and get a valuable insight into the size
of the table.

```
{"address":"0x100ebff18", "type":"DATA", "shape_id":0, "slot_size":80,
"struct":"VM/fstring_table", "memsize":131176, ...
```
2025-04-19 12:42:14 +09:00
Jean Boussier
52487705d0 Fix style of recent fstring feature 2025-04-19 11:38:22 +09:00
John Hawthorn
57b6a7503f Lock-free hash set for fstrings [Feature #21268]
This implements a hash set which is wait-free for lookup and lock-free
for insert (unless resizing) to use for fstring de-duplication.

As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of
fstrings (frozen interned strings) can significantly reduce the
parallelism of Ractors.

I tried a few other approaches first: using an RWLock, striping a series
of RWlocks (partitioning the hash N-ways to reduce lock contention), and
putting a cache in front of it. All of these improved the situation, but
were unsatisfying as all still required locks for writes (and granular
locks are awkward, since we run the risk of needing to reach a vm
barrier) and this table is somewhat write-heavy.

My main reference for this was Cliff Click's talk on a lock free
hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It
turns out this lock-free hash set is made easier to implement by a few
properties:

 * We only need a hash set rather than a hash table (we only need keys,
   not values), and so the full entry can be written as a single VALUE
 * As a set we only need lookup/insert/delete, no update
 * Delete is only run inside GC so does not need to be atomic (It could
   be made concurrent)
 * I use rb_vm_barrier for the (rare) table rebuilds (It could be made
   concurrent) We VM lock (but don't require other threads to stop) for
   table rebuilds, as those are rare
 * The conservative garbage collector makes deferred replication easy,
   using a T_DATA object

Another benefits of having a table specific to fstrings is that we
compare by value on lookup/insert, but by identity on delete, as we only
want to remove the exact string which is being freed. This is faster and
provides a second way to avoid the race condition in
https://bugs.ruby-lang.org/issues/21172.

This is a pretty standard open-addressing hash table with quadratic
probing. Similar to our existing st_table or id_table. Deletes (which
happen on GC) replace existing keys with a tombstone, which is the only
type of update which can occur. Tombstones are only cleared out on
resize.

Unlike st_table, the VALUEs are stored in the hash table itself
(st_table's bins) rather than as a compact index. This avoids an extra
pointer dereference and is possible because we don't need to preserve
insertion order. The table targets a load factor of 2 (it is enlarged
once it is half full).
2025-04-18 13:03:54 +09:00
John Hawthorn
89199a47db Extract rb_gc_free_fstring to string.c
This allows more flexibility in how we deal with the fstring table
2025-04-18 13:03:54 +09:00
Samuel Williams
c13ac4d615 Assert the GVL is held when performing various rb_ functions.
[Feature #20877]
2025-04-14 18:28:09 +09:00
Burdette Lamar
2a55cc3fb8
[DOC] Tweaks to String::try_convert 2025-04-02 12:03:17 -04:00
Étienne Barrié
6ecfe643b5 Freeze $/ and make it ractor safe
[Feature #21109]

By always freezing when setting the global rb_rs variable, we can ensure
it is not modified and can be accessed from a ractor.

We're also making sure it's an instance of String and does not have any
instance variables.

Of course, if $/ is changed at runtime, it may cause surprising behavior
but doing so is deprecated already anyway.

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
2025-03-27 17:54:56 +01:00