Commit graph

2084 commits

Author SHA1 Message Date
Nobuyoshi Nakada
b42afa1dbc
Suppress gcc 15 unterminated-string-initialization warnings 2025-04-30 20:04:10 +09:00
Jean Boussier
3ec7bfff2e Use a set_table for rb_vm_struct.unused_block_warning_table
Now that we have a hash-set implementation we can use that
instead of a hash-table with a static value.
2025-04-27 11:59:28 +02:00
Jean Boussier
c0417bd094 Use set_table to track const caches
Now that we have a `set_table` implementation, we can
use it to track const caches and save some memory.

We could even save some more memory if `numtable` didn't
store a copy of the `hash` and instead recomputed it every
time, but this is a quick win.
2025-04-26 12:10:32 +02:00
Jeremy Evans
e4f85bfc31 Implement Set as a core class
Set has been an autoloaded standard library since Ruby 3.2.
The standard library Set is less efficient than it could be, as it
uses Hash for storage, which stores unnecessary values for each key.

Implementation details:

* Core Set uses a modified version of `st_table`, named `set_table`.
  than `s/st_/set_/`, the main difference is that the stored records
  do not have values, making them 1/3 smaller. `st_table_entry` stores
  `hash`, `key`, and `record` (value), while `set_table_entry` only
  stores `hash` and `key`.  This results in large sets using ~33% less
  memory compared to stdlib Set.  For small sets, core Set uses 12% more
  memory (160 byte object slot and 64 malloc bytes, while stdlib set
  uses 40 for Set and 160 for Hash).  More memory is used because
  the set_table is embedded and 72 bytes in the object slot are
  currently wasted. Hopefully we can make this more efficient and have
  it stored in an 80 byte object slot in the future.

* All methods are implemented as cfuncs, except the pretty_print
  methods, which were moved to `lib/pp.rb` (which is where the
  pretty_print methods for other core classes are defined).  As is
  typical for core classes, internal calls call C functions and
  not Ruby methods.  For example, to check if something is a Set,
  `rb_obj_is_kind_of` is used, instead of calling `is_a?(Set)` on the
  related object.

* Almost all methods use the same algorithm that the pure-Ruby
  implementation used.  The exception is when calling `Set#divide` with a
  block with 2-arity.  The pure-Ruby method used tsort to implement this.
  I developed an algorithm that only allocates a single intermediate
  hash and does not need tsort.

* The `flatten_merge` protected method is no longer necessary, so it
  is not implemented (it could be).

* Similar to Hash/Array, subclasses of Set are no longer reflected in
  `inspect` output.

* RDoc from stdlib Set was moved to core Set, with minor updates.

This includes a comprehensive benchmark suite for all public Set
methods.  As you would expect, the native version is faster in the
vast majority of cases, and multiple times faster in many cases.
There are a few cases where it is significantly slower:

* Set.new with no arguments (~1.6x)
* Set#compare_by_identity for small sets (~1.3x)
* Set#clone for small sets (~1.5x)
* Set#dup for small sets (~1.7x)

These are slower as Set does not currently use the AR table
optimization that Hash does, so a new set_table is initialized for
each call.  I'm not sure it's worth the complexity to have an AR
table-like optimization for small sets (for hashes it makes sense,
as small hashes are used everywhere in Ruby).

The rbs and repl_type_completor bundled gems will need updates to
support core Set.  The pull request marks them as allowed failures.

This passes all set tests with no changes.  The following specs
needed modification:

* Modifying frozen set error message (changed for the better)
* `Set#divide` when passed a 2-arity block no longer yields the same
  object as both the first and second argument (this seems like an issue
  with the previous implementation).
* Set-like objects that override `is_a?` such that `is_a?(Set)` return
  `true` are no longer treated as Set instances.
* `Set.allocate.hash` is no longer the same as `nil.hash`
* `Set#join` no longer calls `Set#to_a` (it calls the underlying C
   function).
* `Set#flatten_merge` protected method is not implemented.

Previously, `set.rb` added a `SortedSet` autoload, which loads
`set/sorted_set.rb`.  This replaces the `Set` autoload in `prelude.rb`
with a `SortedSet` autoload, but I recommend removing it and
`set/sorted_set.rb`.

This moves `test/set/test_set.rb` to `test/ruby/test_set.rb`,
reflecting that switch to a core class.  This does not move the spec
files, as I'm not sure how they should be handled.

Internally, this uses the st_* types and functions as much as
possible, and only adds set_* types and functions as needed.
The underlying set_table implementation is stored in st.c, but
there is no public C-API for it, nor is there one planned, in
order to keep the ability to change the internals going forward.

For internal uses of st_table with Qtrue values, those can
probably be replaced with set_table.  To do that, include
internal/set_table.h.  To handle symbol visibility (rb_ prefix),
internal/set_table.h uses the same macro approach that
include/ruby/st.h uses.

The Set class (rb_cSet) and all methods are defined in set.c.
There isn't currently a C-API for the Set class, though C-API
functions can be added as needed going forward.

Implements [Feature #21216]

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Oliver Nutter <mrnoname1000@riseup.net>
2025-04-26 10:31:11 +09:00
Nobuyoshi Nakada
349f36c527
Get rid of quadratic downloads of Unicode data files 2025-04-22 21:09:26 +09:00
Takashi Kokubun
33a052486b Assert everything is compiled in test_zjit (https://github.com/Shopify/zjit/pull/40)
* Assert everything is compiled in test_zjit

* Update a comment on rb_zjit_assert_compiles

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>

* Add a comment about assert_compiles

* Actually use pipe_fd

---------

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2025-04-18 21:52:59 +09:00
Takashi Kokubun
0a543daf15 Add zjit_* instructions to profile the interpreter (https://github.com/Shopify/zjit/pull/16)
* Add zjit_* instructions to profile the interpreter

* Rename FixnumPlus to FixnumAdd

* Update a comment about Invalidate

* Rename Guard to GuardType

* Rename Invalidate to PatchPoint

* Drop unneeded debug!()

* Plan on profiling the types

* Use the output of GuardType as type refined outputs
2025-04-18 21:52:59 +09:00
Takashi Kokubun
0bb709718b Hook ZJIT compilation 2025-04-18 21:52:56 +09:00
Takashi Kokubun
8ad08f1126 Fix template/Makefile.in 2025-04-18 21:52:55 +09:00
Takashi Kokubun
344ee211d6 Link zjit into the interpreter 2025-04-18 21:52:55 +09:00
Mari Imaizumi
63b07cdcbb [Feature #20724] Bump Unicode version to 16.0.0 2025-04-18 19:50:23 +09:00
Samuel Williams
c13ac4d615 Assert the GVL is held when performing various rb_ functions.
[Feature #20877]
2025-04-14 18:28:09 +09:00
Jean Boussier
0350290262 Ractor: Fix moving embedded objects
[Bug #20271]
[Bug #20267]
[Bug #20255]

`rb_obj_alloc(RBASIC_CLASS(obj))` will always allocate from the basic
40B pool, so if `obj` is larger than `40B`, we'll create a corrupted
object when we later copy the shape_id.

Instead we can use the same logic than ractor copy, which is
to use `rb_obj_clone`, and later ask the GC to free the original
object.

We then must turn it into a `T_OBJECT`, because otherwise
just changing its class to `RactorMoved` leaves a lot of
ways to keep using the object, e.g.:

```
a = [1, 2, 3]
Ractor.new{}.send(a, move: true)
[].concat(a) # Should raise, but wasn't.
```

If it turns out that `rb_obj_clone` isn't performant enough
for some uses, we can always have carefully crafted specialized
paths for the types that would benefit from it.
2025-03-31 12:01:55 +02:00
Hiroshi SHIBATA
88f0c04174 Use release version of turbo_tests 2025-03-26 19:37:22 +09:00
Mari Imaizumi
e63c516046 [Feature #19908] Update Unicode headers to 15.1.0 2025-03-18 21:18:12 +09:00
Takashi Kokubun
e8f8565dc2 Remove obsoleted insn_may_depend_on_sp_or_pc()
which was for MJIT
2025-03-05 16:23:31 -08:00
John Hawthorn
443e2ec27d Replace tombstone when converting AR to ST hash
[Bug #21170]

st_table reserves -1 as a special hash value to indicate that an entry
has been deleted. So that that's a valid value to be returned from the
hash function, do_hash replaces -1 with 0 so that it is not mistaken for
the sentinel.

Previously, when upgrading an AR table to an ST table,
rb_st_add_direct_with_hash was used which did not perform the same
conversion, this could lead to a hash in a broken state where one if its
entries which was supposed to exist being marked as a tombstone.

The hash could then become further corrupted when the ST table required
resizing as the falsely tombstoned entry would be skipped but it would
be counted in num entries, leading to an uninitialized entry at index
15.

In most cases this will be really rare, unless using a very poorly
implemented custom hash function.

This also adds two debug assertions, one that st_add_direct_with_hash
does not receive the reserved hash value, and a second in
rebuild_table_with, which ensures that after we rebuild/compact a table
it contains the expected number of elements.

Co-authored-by: Alan Wu <alanwu@ruby-lang.org>
2025-03-05 14:05:24 -08:00
Nobuyoshi Nakada
4a67ef09cc
[Feature #21116] Extract RJIT as a third-party gem 2025-02-13 18:01:03 +09:00
Jean Boussier
f32d5071b7 Elide string allocation when using String#gsub in MAP mode
If the provided Hash doesn't have a default proc, we know for
sure that we'll never call into user provided code, hence the
string we allocate to access the Hash can't possibly escape.

So we don't actually have to allocate it, we can use a fake_str,
AKA a stack allocated string.

```
compare-ruby: ruby 3.5.0dev (2025-02-10T13:47:44Z master 3fb455adab) +PRISM [arm64-darwin23]
built-ruby: ruby 3.5.0dev (2025-02-10T17:09:52Z opt-gsub-alloc ea5c28958f) +PRISM [arm64-darwin23]
warming up....

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|escape           |      3.374k|    3.722k|
|                 |           -|     1.10x|
|escape_bin       |      5.469k|    6.587k|
|                 |           -|     1.20x|
|escape_utf8      |      3.465k|    3.734k|
|                 |           -|     1.08x|
|escape_utf8_bin  |      5.752k|    7.283k|
|                 |           -|     1.27x|
```
2025-02-12 10:23:50 +01:00
Hiroshi SHIBATA
801885c7f9
Move bundled_gems_spec-run task to exam because we repeatedly test bundled gems for debugging 2025-01-27 14:50:57 +09:00
Hiroshi SHIBATA
47723bb591 Added tracer for irb test 2025-01-24 15:46:46 +09:00
Peter Zhu
7ed08c4fd3 Fix memory leak in rb_gc_vm_weak_table_foreach
When deleting from the generic ivar table, we need to free the gen_ivtbl
otherwise we will have a memory leak.
2025-01-23 10:24:35 -05:00
Nobuyoshi Nakada
ba44e92573 ext/json no longer uses ragel 2025-01-20 21:37:20 +09:00
Nobuyoshi Nakada
d399e0c2b6
Move probes.h to all-incs
It is platform dependent, should not generate by default.
2025-01-15 22:11:47 +09:00
Hiroshi SHIBATA
667e938f1d rdoc-srcdir can refer srcdir by itself 2025-01-15 16:52:56 +09:00
Hiroshi SHIBATA
c6923278d8 Fixed missing kpeg issue with test-bundled-gems 2025-01-15 16:52:56 +09:00
Hiroshi SHIBATA
86575e243e Use rdoc provided by bundled gems for generating ruby documentation 2025-01-15 16:52:56 +09:00
ydah
ccb4ba45ed Use LRAMA instead of YACC 2025-01-14 17:20:02 +09:00
Nobuyoshi Nakada
2e38b3effb
Update probes.h by incs 2025-01-13 09:57:01 +09:00
Hiroshi SHIBATA
fcecef7752 Added logger dependency for Bundler's example 2025-01-10 10:19:39 +09:00
Nobuyoshi Nakada
7962f32b70
Fix hello when transforming program names 2024-12-27 10:29:34 +09:00
Nobuyoshi Nakada
970513f677
Sort undocumented entry list [ci skip] 2024-12-25 12:04:32 +09:00
Nobuyoshi Nakada
f7ce62cc5b
Add hello 2024-12-22 23:14:03 +09:00
Nobuyoshi Nakada
4fb5d746ce
Split modular-gc into build and installation 2024-12-22 22:10:26 +09:00
Hiroshi SHIBATA
6a1aaf3679
Separated load path for test-bundler tasks for Windows 2024-12-12 15:10:21 +09:00
Hiroshi SHIBATA
f43e04ce09 Hide pending results of turbo_tests 2024-12-12 14:43:07 +09:00
Hiroshi SHIBATA
1967ae20b9 Use patched version of turbo_tests 2024-12-12 14:43:07 +09:00
Hiroshi SHIBATA
91f6c370af Use turbo_tests instead of parallel_tests 2024-12-12 14:43:07 +09:00
Nobuyoshi Nakada
f12e2622c1 Split system dependent commands to clean modular-gc 2024-12-10 12:31:47 +09:00
Peter Zhu
cfc2b21a05 Clean all modular GCs
We should run `make clean` or `make distclean` on each of the GC directories.
2024-12-09 16:35:21 -05:00
Peter Zhu
5d4242fa81 Only delete gc directory if empty
If building in the source directory, this will delete the gc directory.
2024-12-09 16:35:21 -05:00
Peter Zhu
88d49628dd Don't delete .gc directory
We build in the gc directory since commit d057503252,
so we don't need to remove the .gc directory.
2024-12-09 16:35:21 -05:00
Matt Valentine-House
ffb26a53d1 Add Modular GC (default, MMTk) builds to CI 2024-12-06 09:48:30 +00:00
Randy Stauner
b021f6f8a7
Use symbol.h in vm.c to get macro for faster ID to sym (#12272)
The macro provided by symbol.h uses STATIC_ID2SYM
when it can which speeds up methods that declare keyword args.

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-12-05 17:51:32 -05:00
Peter Zhu
ce1ad1b816 Standardize on the name "modular GC"
We have name fragmentation for this feature, including "shared GC",
"modular GC", and "external GC". This commit standardizes the feature
name to "modular GC" and the implementation to "GC library".
2024-12-05 10:33:26 -05:00
Hiroshi SHIBATA
56576b6cce Removed needless RSpec option 2024-12-04 13:09:40 +09:00
Hiroshi SHIBATA
b532662d2d Use same RSPECOPTS for test-bundler and test-bundler-parallel 2024-12-04 13:09:40 +09:00
Nobuyoshi Nakada
239c30798a Simplify test-bundler-parallel
Get rid of repeated exec XRUBY recursively.
2024-12-04 13:09:40 +09:00
Hiroshi SHIBATA
4e382c285f Lock json-schema-5.1.0 2024-12-03 09:53:17 +09:00
Nobuyoshi Nakada
1df52e097b
yasmdata.rb is no longer generated for years [ci skip] 2024-12-02 09:07:25 +09:00