Commit graph

20692 commits

Author SHA1 Message Date
OrenGitHub
ae299cc9cd [ruby/psych] add first test for safe load stream
336553b412
2025-05-09 17:53:17 +00:00
Peter Zhu
011982ef83 Fix warnings in tests for ObjectSpace._id2ref
There's a lot of warnings when running test_objectspace.rb because of
ObjectSpace._id2ref. For example:

    test_objectspace.rb:19: warning: ObjectSpace._id2ref is deprecated
2025-05-09 12:43:08 -04:00
Takashi Kokubun
102063f964 ZJIT: Fix a splitting condition for LHS 2025-05-09 09:32:40 -07:00
Hiroshi SHIBATA
600c616507 Removed CGI library without CGI::Escape features 2025-05-09 14:27:28 +09:00
Hiroshi SHIBATA
56423d43a3 Reduce loading class/module on CGIUtilTest and CGIEscapeTest 2025-05-09 14:27:28 +09:00
Hiroshi SHIBATA
382be44f42 Extract CGIEscapeTest from CGIUtilTest 2025-05-09 14:27:28 +09:00
Hiroshi SHIBATA
8a1d45144b Support require 'cgi/escape' with extracting CGI::Escape from CGI::Util 2025-05-09 14:27:28 +09:00
Jean Boussier
b67711b17a Fix remove_instance_variable on complex objects
Introduced in: https://github.com/ruby/ruby/pull/13159

Now that there is no longer a unique TOO_COMPLEX shape with
no children, checking `shape->type == TOO_COMPLEX` is incorrect.
2025-05-08 21:48:35 +02:00
Ellen Marie Dash
a41eed99c0
[rubygems/rubygems] Update TarWriter test to store mtime in a variable
0e2cec3fa3
2025-05-08 18:03:04 +09:00
Yusuke Nakamura
819ecd115d
[rubygems/rubygems] Add mtime to Gem::Package::TarWriter#add_file argument
Since 9e21dd9, Gem::Package::TarWriter#add_file adds the file to
the tar with Gem.source_date_epoch for its mtime.
This behavior breaks the code depending on the previous add_file
behavior.
Therefore, add_file accepts mtime as an argument, and uses
Gem.source_date_epoch if not specified.

7020ea98a0
2025-05-08 18:03:04 +09:00
Charles Oliver Nutter
8685a81e6a
[ruby/strscan] jruby: Check if len++ walked off the end
(https://github.com/ruby/strscan/pull/153)

Fix https://github.com/ruby/strscan/pull/152

CRuby can walk off the end because there's always a null byte. In JRuby,
the byte array is often (usually?) the exact size of the string. So we
need to check if len++ walked off the end.

This code was ported from a version by @byroot in
https://github.com/ruby/strscan/pull/127 but I missed adding this check
due to a lack of tests. A test is included for both "-" and "+" parsing.

1abe4ca556
2025-05-08 18:03:04 +09:00
Charles Oliver Nutter
5a0306f9c1
[ruby/strscan] jruby: Pass end index to byteListToInum
(https://github.com/ruby/strscan/pull/150)

These parse methods take begin and end indices, not begin and length. A
test is included.

Fixes https://github.com/jruby/jruby/issues/8823

9690e39e73
2025-05-08 18:03:04 +09:00
Jean Boussier
49b4e0350d Make test/ruby/test_env.rb#test_delete_if_in_ractor easier to debug 2025-05-08 09:50:45 +02:00
Jean Boussier
f48e45d1e9 Move object_id in object fields.
And get rid of the `obj_to_id_tbl`

It's no longer needed, the `object_id` is now stored inline
in the object alongside instance variables.

We still need the inverse table in case `_id2ref` is invoked, but
we lazily build it by walking the heap if that happens.

The `object_id` concern is also no longer a GC implementation
concern, but a generic implementation.

Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
2025-05-08 07:58:05 +02:00
Jean Boussier
0ea210d1ea Rename ivptr -> fields, next_iv_index -> next_field_index
Ivars will longer be the only thing stored inline
via shapes, so keeping the `iv_index` and `ivptr` names
would be confusing.

Instance variables won't be the only thing stored inline
via shapes, so keeping the `ivptr` name would be confusing.

`field` encompass anything that can be stored in a VALUE array.

Similarly, `gen_ivtbl` becomes `gen_fields_tbl`.
2025-05-08 07:58:05 +02:00
Takashi Kokubun
cbf9c088f8
YJIT: End the block after OPTIMIZE_METHOD_TYPE_CALL (#13245) 2025-05-05 13:35:28 -07:00
Jeremy Evans
21035c826d Handle mutating of array passed to Set.new during iteration
This avoids a heap-use-after-free.

Fixes [Bug #21306]
2025-05-04 04:10:57 +09:00
Jeremy Evans
be665cf855 Handle mutation of array being merged into set
Check length of array during every iteration, as a #hash method
could truncate the array, resulting in heap-use-after-free.

Fixes [Bug #21305]
2025-05-04 04:10:57 +09:00
Nobuyoshi Nakada
430789dec4 [ruby/psych] Ensure to remove the test constants
dd3685aa67
2025-05-02 06:27:11 +00:00
Sutou Kouhei
af6d6b64ea [ruby/strscan] named_captures: fix incompatibility with
MatchData#named_captures
(https://github.com/ruby/strscan/pull/146)

Fix https://github.com/ruby/strscan/pull/145

`MatchData#named_captures` use the last matched value for each name.

Reported by Linus Sellberg. Thanks!!!

a6086ea322
2025-05-02 09:52:38 +09:00
Mike Perham
5d0708378e
[rubygems/rubygems] Smoother authentication experience
Copying the URL is painful here because the URL is embedded within a paragraph of text. I presume we don't want to automatically open the browser.

Instead, move the URL to its own line so that "triple click" will automatically select the whole thing.

21532a69ae
2025-05-02 09:49:15 +09:00
nick evans
136dc52663 Add support for Data objects with ivars
This sets the ivars _before_ calling initialize, which feels wrong.  But
Data doesn't give us any mechanism for setting the members other than 1)
initialize, or 2) drop down into the C API.  Since initialize freezes
the object, we need to set the ivars before that.  I think this is a
reasonable compromise—if users need better handling, they can implement
their own `encode_with` and `init_with`.  But it will lead to unhappy
surprises for some users.

Alternatively, we could use the C API, similarly to Marshal.  Psych _is_
already using the C API for path2class and build_exception.  This would
be the least surprising behavior for users, I think.
2025-05-01 17:52:14 +00:00
nick evans
a397e4d4b0 [ruby/psych] Add support for ruby 3.2 Data objects
788b844c83
2025-05-01 17:52:13 +00:00
Martin Meyerhoff
bd1d6e8cd7 [ruby/psych] Fix loading/parsing regular expressions
This fixes the issue where regular expression would come back slightly
different after going through a YAML load/dump cycle. Because we're used
to having to escape forward slashes in regular expression literals
(because the literal is delimited by slashes), but the deserializer
takes the literal output from `Regexp#inspect` and feeds it as a string
into `Regexp.new`, which expects a string, not a Regexp literal, cycling
did not properly work before this commit.

I've also changed the code to be a bit more readable, I hope this
doesn't affect performance.

f4dd8dadad
2025-05-01 17:50:13 +00:00
Yusuke Endoh
5cee3329df Skip test affected by TracePoint-dependent allocation_class_path
These assertions fail when TracePoint is enabled due to differing
allocation context. Commented out for now until behavior is fixed.

See [Bug #21298]
2025-05-01 17:21:36 +09:00
Yusuke Endoh
e8ad728209 Omit tests using ISeq#to_binary under coverage measurement
... because ISeq#to_binary does not work
2025-05-01 14:15:55 +09:00
Jean Boussier
f55138c9e7 [ruby/psych] Handle Ruby 3.5 new Set class
Since `Set` no longer is a regular object class holding a Hash
it needs to be specially handled.

c2d185d27c
2025-04-30 18:31:33 +00:00
Matt Valentine-House
46c9e46ef6 [ruby/mmtk] Exclude the test_ractor_parallel test with MMTk
86b0dbeca8
2025-04-30 13:41:21 +00:00
Matt Valentine-House
59a902cd79 [ruby/mmtk] test_finalize is in TestObjectSpace not TestObjSpace
These filenames are passed into test classes, and the tests we're trying
to exclude exist in TestObjectSpace in the Ruby repo, not TestObjSpace

195728dc8c
2025-04-30 13:41:21 +00:00
Hiroshi SHIBATA
8b4017584b Use EnvUtil.apply_timeout_scale for test_io_wait.rb 2025-04-30 16:59:16 +09:00
Scott Myron
a3ec53bbb0 [ruby/json] Introduce ARM Neon and SSE2 SIMD.
(https://github.com/ruby/json/pull/743)

See the pull request for the long development history: https://github.com/ruby/json/pull/743

```
== Encoding activitypub.json (52595 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     2.913k i/100ms
Calculating -------------------------------------
               after     29.377k (± 2.0%) i/s   (34.04 μs/i) -    148.563k in   5.059169s

Comparison:
              before:    23314.1 i/s
               after:    29377.3 i/s - 1.26x  faster

== Encoding citm_catalog.json (500298 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   152.000 i/100ms
Calculating -------------------------------------
               after      1.569k (± 0.8%) i/s  (637.49 μs/i) -      7.904k in   5.039001s

Comparison:
              before:     1485.6 i/s
               after:     1568.7 i/s - 1.06x  faster

== Encoding twitter.json (466906 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   309.000 i/100ms
Calculating -------------------------------------
               after      3.115k (± 3.1%) i/s  (321.01 μs/i) -     15.759k in   5.063776s

Comparison:
              before:     2508.3 i/s
               after:     3115.2 i/s - 1.24x  faster
```

49003523da
2025-04-30 08:12:41 +02:00
Jean Boussier
5566a7f740 [ruby/json] Handle non-string keys returning immediate values via to_s
We can't directly call `RBASIC_CLASS` as the return value of
`to_s` may be an immediate.

12dc394d11
2025-04-30 08:12:41 +02:00
Jean Boussier
8fe3fb5d5a [ruby/json] Stop caching the generator state pointer
Fix: https://github.com/ruby/json/issues/790

If we end up calling something that spills the state
on the heap, the pointer we received is outdated and
may be out of sync.

2ffa4ea46b
2025-04-30 08:12:41 +02:00
Jean Boussier
b5426826f9 test/ruby/test_set.rb: mmtk doesn't have GC.compact 2025-04-29 22:36:06 +02:00
Aaron Patterson
203614080f opt_new needs to happen after safe navigation
If safe navigation instructions happen first, we get a stack
inconsistency error.
2025-04-29 13:33:23 -07:00
Aaron Patterson
e6974be545 Don't call hash tombstone compaction from GC compaction
Tombstone removal may possibly require allocation, and we're not allowed
to allocate during GC.  This commit also renames `set_compact` to
`set_update_references` to differentiate tombstone removal compaction with GC
object compaction.

Co-Authored-By: Max Bernstein <max.bernstein@shopify.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
2025-04-29 13:33:10 -07:00
Takashi Kokubun
0f3d6ee578
ZJIT: Disable ZJIT instructions when USE_ZJIT is 0 (#13199)
* ZJIT: Disable ZJIT instructions when USE_ZJIT is 0

* Test the order of ZJIT instructions

* Add more jobs that disable JITs

* Show instruction names in the message
2025-04-29 11:03:13 -07:00
Max Bernstein
10fd5a6357 Add tests 2025-04-29 09:13:25 -07:00
Jeremy Evans
926411171d
Support Marshal.{dump,load} for core Set
This was missed when adding core Set, because it's handled
implicitly for T_OBJECT.

Keep marshal compatibility between core Set and stdlib Set,
so you can unmarshal core Set with stdlib Set and vice versa.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2025-04-28 08:38:35 -07:00
Jeremy Evans
73f8d0a9c8 Fix nondeterministic failure in test_latest_gc_info_weak_references_count
Clear the ary variable before setting it to nil.  Otherwise, if
the previous ary value was somewhere on the stack, all references
in it would be considered live, and the wmap size would be 10000.
2025-04-28 08:09:56 +02:00
Taketo Takashima
687bd83724 [ruby/ipaddr] Added IPAddr#+/-
78b4f53bf5
2025-04-26 11:56:42 +00:00
Jeremy Evans
e4f85bfc31 Implement Set as a core class
Set has been an autoloaded standard library since Ruby 3.2.
The standard library Set is less efficient than it could be, as it
uses Hash for storage, which stores unnecessary values for each key.

Implementation details:

* Core Set uses a modified version of `st_table`, named `set_table`.
  than `s/st_/set_/`, the main difference is that the stored records
  do not have values, making them 1/3 smaller. `st_table_entry` stores
  `hash`, `key`, and `record` (value), while `set_table_entry` only
  stores `hash` and `key`.  This results in large sets using ~33% less
  memory compared to stdlib Set.  For small sets, core Set uses 12% more
  memory (160 byte object slot and 64 malloc bytes, while stdlib set
  uses 40 for Set and 160 for Hash).  More memory is used because
  the set_table is embedded and 72 bytes in the object slot are
  currently wasted. Hopefully we can make this more efficient and have
  it stored in an 80 byte object slot in the future.

* All methods are implemented as cfuncs, except the pretty_print
  methods, which were moved to `lib/pp.rb` (which is where the
  pretty_print methods for other core classes are defined).  As is
  typical for core classes, internal calls call C functions and
  not Ruby methods.  For example, to check if something is a Set,
  `rb_obj_is_kind_of` is used, instead of calling `is_a?(Set)` on the
  related object.

* Almost all methods use the same algorithm that the pure-Ruby
  implementation used.  The exception is when calling `Set#divide` with a
  block with 2-arity.  The pure-Ruby method used tsort to implement this.
  I developed an algorithm that only allocates a single intermediate
  hash and does not need tsort.

* The `flatten_merge` protected method is no longer necessary, so it
  is not implemented (it could be).

* Similar to Hash/Array, subclasses of Set are no longer reflected in
  `inspect` output.

* RDoc from stdlib Set was moved to core Set, with minor updates.

This includes a comprehensive benchmark suite for all public Set
methods.  As you would expect, the native version is faster in the
vast majority of cases, and multiple times faster in many cases.
There are a few cases where it is significantly slower:

* Set.new with no arguments (~1.6x)
* Set#compare_by_identity for small sets (~1.3x)
* Set#clone for small sets (~1.5x)
* Set#dup for small sets (~1.7x)

These are slower as Set does not currently use the AR table
optimization that Hash does, so a new set_table is initialized for
each call.  I'm not sure it's worth the complexity to have an AR
table-like optimization for small sets (for hashes it makes sense,
as small hashes are used everywhere in Ruby).

The rbs and repl_type_completor bundled gems will need updates to
support core Set.  The pull request marks them as allowed failures.

This passes all set tests with no changes.  The following specs
needed modification:

* Modifying frozen set error message (changed for the better)
* `Set#divide` when passed a 2-arity block no longer yields the same
  object as both the first and second argument (this seems like an issue
  with the previous implementation).
* Set-like objects that override `is_a?` such that `is_a?(Set)` return
  `true` are no longer treated as Set instances.
* `Set.allocate.hash` is no longer the same as `nil.hash`
* `Set#join` no longer calls `Set#to_a` (it calls the underlying C
   function).
* `Set#flatten_merge` protected method is not implemented.

Previously, `set.rb` added a `SortedSet` autoload, which loads
`set/sorted_set.rb`.  This replaces the `Set` autoload in `prelude.rb`
with a `SortedSet` autoload, but I recommend removing it and
`set/sorted_set.rb`.

This moves `test/set/test_set.rb` to `test/ruby/test_set.rb`,
reflecting that switch to a core class.  This does not move the spec
files, as I'm not sure how they should be handled.

Internally, this uses the st_* types and functions as much as
possible, and only adds set_* types and functions as needed.
The underlying set_table implementation is stored in st.c, but
there is no public C-API for it, nor is there one planned, in
order to keep the ability to change the internals going forward.

For internal uses of st_table with Qtrue values, those can
probably be replaced with set_table.  To do that, include
internal/set_table.h.  To handle symbol visibility (rb_ prefix),
internal/set_table.h uses the same macro approach that
include/ruby/st.h uses.

The Set class (rb_cSet) and all methods are defined in set.c.
There isn't currently a C-API for the Set class, though C-API
functions can be added as needed going forward.

Implements [Feature #21216]

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Oliver Nutter <mrnoname1000@riseup.net>
2025-04-26 10:31:11 +09:00
Aaron Patterson
ec3b48d3da Deopt if iseq trace events are enabled 2025-04-25 13:46:05 -07:00
Aaron Patterson
8ac8225c50 Inline Class#new.
This commit inlines instructions for Class#new.  To make this work, we
added a new YARV instructions, `opt_new`.  `opt_new` checks whether or
not the `new` method is the default allocator method.  If it is, it
allocates the object, and pushes the instance on the stack.  If not, the
instruction jumps to the "slow path" method call instructions.

Old instructions:

```
> ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path                   <ic:0 Object>             (   1)[Li]
0002 opt_send_without_block                 <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```

New instructions:

```
> ./miniruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path                   <ic:0 Object>             (   1)[Li]
0002 putnil
0003 swap
0004 opt_new                                <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block                 <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump                                   14
0011 opt_send_without_block                 <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```

This commit speeds up basic object allocation (`Foo.new`) by 60%, but
classes that take keyword parameters see an even bigger benefit because
no hash is allocated when instantiating the object (3x to 6x faster).

Here is an example that uses `Hash.new(capacity: 0)`:

```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
  Time (mean ± σ):      1.082 s ±  0.004 s    [User: 1.074 s, System: 0.008 s]
  Range (min … max):    1.076 s …  1.088 s    10 runs

Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
  Time (mean ± σ):     627.9 ms ±   3.5 ms    [User: 622.7 ms, System: 4.8 ms]
  Range (min … max):   622.7 ms … 633.2 ms    10 runs

Summary
  ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
    1.72 ± 0.01 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```

This commit changes the backtrace for `initialize`:

```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
class Foo
  def initialize
    puts caller
  end
end

def hello
  Foo.new
end

hello
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:8:in 'Class#new'
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
aaron@tc ~/g/ruby (inline-new)> ./miniruby -v test.rb
ruby 3.5.0dev (2025-03-28T23:59:40Z inline-new c4157884e4) +PRISM [arm64-darwin24]
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
```

It also increases memory usage for calls to `new` by 122 bytes:

```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
require "objspace"

class Foo
  def initialize
    puts caller
  end
end

def hello
  Foo.new
end

puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:hello)))
aaron@tc ~/g/ruby (inline-new)> make runruby
RUBY_ON_BUG='gdb -x ./.gdbinit -p' ./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems  ./test.rb
656
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
```

Thanks to @ko1 for coming up with this idea!

Co-Authored-By: John Hawthorn <john@hawthorn.email>
2025-04-25 13:46:05 -07:00
Yusuke Endoh
5113869f5d Fix a flaky test by making sure that a test thread stops
```
    1) Failure:
  TestThread#test_join_argument_conversion [D:/a/ruby/ruby/src/test/ruby/test_thread.rb:249]:
  Expected nil (oid=4) to be the same as #<TestThread::Thread:0x000001e9e13bbc18 D:/a/ruby/ruby/src/test/ruby/test_thread.rb:245 run> (oid=3856).
```
4106719981
2025-04-24 19:06:49 +09:00
Jean Boussier
cb1ea54bbf objspace_dump: Include shareable flag
Given that the currently planned ractor local GC implementation
performance will heavilly be influenced by the number of shareable
objects it would be valuable to be able to know how many of them
are in the heap.
2025-04-24 10:14:29 +02:00
Samuel Williams
c1dbd01c67
Increase fiber sleep test tolerance. (#13152) 2025-04-23 01:17:15 +00:00
Kazuki Yamaguchi
93afcfcde3 [ruby/openssl] asn1: check for missing EOC in indefinite length encoding
EOC octets are required at the end of contents of a constructed encoding
that uses the indefinite length form. This cannot be assumed from the
end of the input. Raise an exception when necessary.

bc20c13a7c
2025-04-20 07:41:15 +00:00
Aiden Fox Ivey
490a6d8ef9 Add codegen for NewArray instruction (https://github.com/Shopify/zjit/pull/110)
* Show failing test

* Add second test case

* Add empty NewArray setup

* Update opt_tests and fix NewArray instantiation

* Add code generation for NewArray

* Add NewArray ordering test
2025-04-18 21:53:01 +09:00
Takashi Kokubun
1b95e9c4a0 Implement JIT-to-JIT calls (https://github.com/Shopify/zjit/pull/109)
* Implement JIT-to-JIT calls

* Use a closer dummy address for Arm64

* Revert an obsoleted change

* Revert a few more obsoleted changes

* Fix outdated comments

* Explain PosMarkers for CCall

* s/JIT code/machine code/

* Get rid of ParallelMov
2025-04-18 21:53:01 +09:00