Before the patch:
```
$ ./miniruby -e '[1, 2].inject(:tap)'
-e:1:in '<main>': wrong number of arguments (given 1, expected 0) (ArgumentError)
from -e:1:in 'Enumerable#inject'
from -e:1:in '<main>'
```
After the patch:
```
$ ./miniruby -e '[1, 2].inject(:tap)'
-e:1:in 'Kernel#tap': wrong number of arguments (given 1, expected 0) (ArgumentError)
from -e:1:in 'Enumerable#inject'
from -e:1:in '<main>'
```
Fixes https://bugs.ruby-lang.org/issues/20968#change-113811
Follow up to fix 3b7373fd00.
In that commit, the line number in the first frame was overwritten after
the whole backtrace was created. There was a problem that the line
number was overwritten even if the location was backpatched.
Instead, this commit uses first_lineno if the frame is
VM_FRAME_MAGIC_DUMMY when generating the backtrace.
Before the patch:
```
$ ./miniruby -e '[1, 2].inject(:tap)'
-e:in '<main>': wrong number of arguments (given 1, expected 0) (ArgumentError)
from -e:1:in 'Enumerable#inject'
from -e:1:in '<main>'
```
After the patch:
```
$ ./miniruby -e '[1, 2].inject(:tap)'
-e:1:in '<main>': wrong number of arguments (given 1, expected 0) (ArgumentError)
from -e:1:in 'Enumerable#inject'
from -e:1:in '<main>'
```
When calling a method that accepts an anonymous splat and literal
keywords without any arguments, an assertion failure was previously
raised. Set rest_index to 0 when setting rest to the frozen hash,
so the args_argc calculation is accurate.
While here, add more tests for methods with anonymous splats with
and without keywords and keyword splats to confirm behavior is
correct.
Also add a basic bootstrap test that would hit the previous assertion
failure.
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Anonymous positional splats cannot be directly accessed, they can
only be passed as splats to other methods. So if an anonymous
positional splat would be empty, you can use a shared frozen
empty array to save an allocation.
```ruby
def a(*) end
a()
```
This is similar to how anonymous empty keyword splats are optimized,
except those use `nil` instead of a shared empty frozen hash.
This updates the allocation tests to check that the array allocations
are avoided where possible.
It also makes a small change to test_iseq.rb to ensure an unfrozen
hash is passed as the value of an anonymous splat parameter.
When we forward an FCALL (a method call with an implicit self), we
shouldn't forward the FCALL flag because it ignores method visibility
checks. This patch removes the FCALL flag from callers.
[Bug #21196]
I don't think we should ever consider forwarded IC's to be "simple".
Previously, the "simple" flag would be copied to the derived IC and this
happened to cause struct set / get iseqs to write an invalid CC
fastpath:
f45eb3dcb9/vm_insnhelper.c (L4726-L4729)
[Bug #20799]
Previously, proc calls such as:
```ruby
proc{|| }.(**empty_hash)
proc{|b: 1| }.(**r2k_array_with_empty_hash)
```
both allocated hashes unnecessarily, due to two separate code paths.
The first call goes through CALLER_SETUP_ARG/vm_caller_setup_keyword_hash,
and is simple to fix by not duping an empty keyword hash that will be
dropped.
The second case is more involved, in setup_parameters_complex, but is
fixed the exact same way as when the ruby2_keywords hash is not empty,
by flattening the rest array to the VM stack, ignoring the last
element (the empty keyword splat). Add a flatten_rest_array static
function to handle this case.
Update test_allocation.rb to automatically convert the method call
allocation tests to proc allocation tests, at least for the calls
that can be converted. With the code changes, all proc call
allocation tests pass, showing that proc calls and method calls
now allocate the same number of objects.
I've audited the allocation tests, and I believe that all of the low
hanging fruit has been collected. All remaining allocations are
either caller side:
* Positional splat + post argument
* Multiple positional splats
* Literal keywords + keyword splat
* Multiple keyword splats
Or callee side:
* Positional splat parameter
* Keyword splat parameter
* Keyword to positional argument conversion for methods that don't accept keywords
* ruby2_keywords method called with keywords
Reapplies abc04e898b, which was reverted at
d56470a27c, with the addition of a bug fix and
test.
Fixes [Bug #20679]
Previous, proc calls such as:
```ruby
proc{|| }.(**empty_hash)
proc{|b: 1| }.(**r2k_array_with_empty_hash)
```
both allocated hashes unnecessarily, due to two separate code paths.
The first call goes through CALLER_SETUP_ARG/vm_caller_setup_keyword_hash,
and is simple to fix by not duping an empty keyword hash that will be
dropped.
The second case is more involved, in setup_parameters_complex, but is
fixed the exact same way as when the ruby2_keywords hash is not empty,
by flattening the rest array to the VM stack, ignoring the last
element (the empty keyword splat). Add a flatten_rest_array static
function to handle this case.
Update test_allocation.rb to automatically convert the method call
allocation tests to proc allocation tests, at least for the calls
that can be converted. With the code changes, all proc call
allocation tests pass, showing that proc calls and method calls
now allocate the same number of objects.
I've audited the allocation tests, and I believe that all of the low
hanging fruit has been collected. All remaining allocations are
either caller side:
* Positional splat + post argument
* Multiple positional splats
* Literal keywords + keyword splat
* Multiple keyword splats
Or callee side:
* Positional splat parameter
* Keyword splat parameter
* Keyword to positional argument conversion for methods that don't accept keywords
* ruby2_keywords method called with keywords
When calling a method that does not accept a positional splat
parameter with a splatted array with a ruby2_keywords flagged hash,
there is no need to duplicate the splatted array. Previously,
Ruby would duplicate the splatted array and potentially modify
it before flattening it to the VM stack
Use a similar approach as the f(*ary, **hash) optimization,
flattening the splatted array to the VM stack without modifying
it, and make any modifications needed to the VM stack.
When calling a method that accepts keywords but not a keyword
splat with a splatted array with a ruby2_keywords flagged hash,
there is no need to duplicate the ruby2_keywords flagged hash,
since it will be accessed to get the keyword values, but it will
not be modified.
This avoids an array allocation when calling a method that does
not accept a positional splat or keywords with both a positional
splat and keywords. Previously, Ruby would dup the positional
splat to append the keyword splat to it. Then it would flatten
the dupped positional splat array to the VM stack.
This flattens the given positional splat to the VM stack, then
adds the keyword splat hash after the last positional splat
element on the VM stack, avoiding the need to modify
the positional splat array.
Treat this similar to keyword splatting nil, using goto ignore.
However, keep previous behavior if the method accepts a keyword
splat, to avoid double hash allocation.
This also can avoid an array allocation when calling a method
that doesn't have any splat parameters but supports literal
keyword parameters, because ignore_keyword_hash_p was not
ignoring the keyword hash in that case.
This change doesn't remove the empty ruby2_keywords hash from
the array, which caused an assertion failure if the method
being called accepted keywords in some cases. Modify the
assertion to handle this case. An alternative approach would
add a flag to the args struct so the args_argc calculation could
handle this case and report the correct argc, but such an approach
would likely be slower.
This commit adds `sendforward` and `invokesuperforward` for forwarding
parameters to calls
Co-authored-by: Matt Valentine-House <matt@eightbitraptor.com>
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.
Calls it optimizes look like this:
```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```
```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```
```ruby
def bar(*a) = a
def foo(...)
list = [1, 2]
bar(*list, ...) # optimized
end
foo(123)
```
All variants of the above but using `super` are also optimized, including a bare super like this:
```ruby
def foo(...)
super
end
```
This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:
```ruby
def m
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def bar(a) = a
def foo(...) = bar(...)
def test
m { foo(123) }
end
test
p test # allocates 1 object on master, but 0 objects with this patch
```
```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)
def test
m { foo(1, b: 2) }
end
test
p test # allocates 2 objects on master, but 0 objects with this patch
```
How does it work?
-----------------
This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.
I think this description is kind of confusing, so let's walk through an example with code.
```ruby
def delegatee(a, b) = a + b
def delegator(...)
delegatee(...) # CI2 (FORWARDING)
end
def caller
delegator(1, 2) # CI1 (argc: 2)
end
```
Before we call the delegator method, the stack looks like this:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # |
5| delegatee(...) # CI2 (FORWARDING) |
6| end |
7| |
8| def caller |
-> 9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method. The `delegator` method has a special local called `...`
that holds the caller's CI object.
Here is the ISeq disasm fo `delegator`:
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
The local called `...` will contain the caller's CI: CI1.
Here is the stack when we enter `delegator`:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
-> 4| # | CI1 (argc: 2)
5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller |
9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`. In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).
Before executing the `send` instruction, we push `...` on the stack. The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
Instruction 001 puts the caller's CI on the stack. `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # | CI1 (argc: 2)
-> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller | self
9| delegator(1, 2) # CI1 (argc: 2) | 1
10| end | 2
```
The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.
Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.
I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.
I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.
For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:
```ruby
SomeObject.new(foo: 1)
```
If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
If the method being called does not have a positional splat
parameter, there is no point in allocating the array, as
decrementing given_argc is sufficient to ensure the empty keyword
hash is not considered an argument, assuming that we are calling
a method/lambda and not a regular proc.
If the method being called does not have a keyword splat parameter,
there is no point in allocating the hash, because the hash will
be unused (as empty keyword hashes are ignored).
Previously it would bypass the `FL_ABLE` check, but
since shapes introduction, it started having a different
behavior than `OBJ_FREEZE`, as it would onyl set the `FL_FREEZE`
flag, but not update the shape.
I have no indication of this causing a bug yet, but it seems
like a trap waiting to happen.
The optimization sets args->rest_dupped to avoid allocating an array,
but this is not safe if the splat array ends in a keyword flagged
hash. Unset args->rest_dupped in this case.
Fixes [Bug #20388]
Previously, calls like the following duplicated the kwsplat hash
unnecessarily:
```ruby
def foo(a:) = a
hash = {a: 10}
foo(**hash)
```
This is due to the fix in ca204a2023. Since it targets when the callee
has no keyword parameters, skip duplicating when the method takes
keywords.
c20e819e8b made calling a method
that accepts specific keywords but not a keyword splat with a keyword
splat allocating a hash when it previously did not.
Instead of duplicating the hash and removing hash entries,
lookup in the provided hash without duplicating the hash, and
track to make sure all hash entries in the hash were looked up
(extra hash entries would be errors).
In cases where a method accepts both keywords and an anonymous
keyword splat, the method was not marked as taking an anonymous
keyword splat. Fix that in the compiler.
Doing that broke handling of nil keyword splats in yjit, so
update yjit to handle that.
Add a test to check that calling a method that accepts both
a keyword argument and an anonymous keyword splat does not
modify a passed keyword splat hash.
Move the anon_kwrest check from setup_parameters_complex to
ignore_keyword_hash_p, and only use it if the keyword hash
is already a hash. This should speed things up slightly as
it avoids a check previously used for all callers of
setup_parameters_complex.
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Thanks to the new semantics from [ruby-core:115808], `**nil` is now
equivalent to `**{}`. Since the only thing one could do with anonymous
keyword rest parameter is to delegate it with `**`, nil is just as good
as an empty hash. Using nil avoids allocating an empty hash.
This is particularly important for `...` methods since they now use
`**kwrest` under the hood after 4f77d8d328. Most calls don't pass
keywords.
Comparison:
fw_no_kw
post: 9816800.9 i/s
pre: 8570297.0 i/s - 1.15x slower
The following code previously caused a crash:
```ruby
h = {}
1000000.times{|i| h[i.to_s.to_sym] = i}
def f(kw: 1, **kws) end
f(**h)
```
Inside a thread or fiber, the size of the keyword splat could be much smaller
and still cause a crash.
I found this issue while optimizing method calling by reducing implicit
allocations. Given the following code:
```ruby
def f(kw: , **kws) end
kw = {kw: 1}
f(**kw)
```
The `f(**kw)` call previously allocated two hashes callee side instead of a
single hash. This is because `setup_parameters_complex` would extract the
keywords from the keyword splat hash to the C stack, to attempt to mirror
the case when literal keywords are passed without a keyword splat. Then,
`make_rest_kw_hash` would build a new hash based on the extracted keywords
that weren't used for literal keywords.
Switch the implementation so that if a keyword splat is passed, literal keywords
are deleted from the keyword splat hash (or a copy of the hash if the hash is
not mutable).
In addition to avoiding the crash, this new approach is much more
efficient in all cases. With the included benchmark:
```
1
miniruby: 5247879.9 i/s
miniruby-before: 2474050.2 i/s - 2.12x slower
1_mutable
miniruby: 1797036.5 i/s
miniruby-before: 1239543.3 i/s - 1.45x slower
10
miniruby: 1094750.1 i/s
miniruby-before: 365529.6 i/s - 2.99x slower
10_mutable
miniruby: 407781.7 i/s
miniruby-before: 225364.0 i/s - 1.81x slower
100
miniruby: 100992.3 i/s
miniruby-before: 32703.6 i/s - 3.09x slower
100_mutable
miniruby: 40092.3 i/s
miniruby-before: 21266.9 i/s - 1.89x slower
1000
miniruby: 21694.2 i/s
miniruby-before: 4949.8 i/s - 4.38x slower
1000_mutable
miniruby: 5819.5 i/s
miniruby-before: 2995.0 i/s - 1.94x slower
```
Previously, this would push the provided keywords onto the argument
splat. Add ruby2_keywords to the list of other checks for whether
it is safe for treating a given splat as mutable when the called
method accepts an anonymous splat.
Ruby makes it easy to delegate all arguments from one method to another:
```ruby
def f(*args, **kw)
g(*args, **kw)
end
```
Unfortunately, this indirection decreases performance. One reason it
decreases performance is that this allocates an array and a hash per
call to `f`, even if `args` and `kw` are not modified.
Due to Ruby's ability to modify almost anything at runtime, it's
difficult to avoid the array allocation in the general case. For
example, it's not safe to avoid the allocation in a case like this:
```ruby
def f(*args, **kw)
foo(bar)
g(*args, **kw)
end
```
Because `foo` may be `eval` and `bar` may be a string referencing `args`
or `kw`.
To fix this correctly, you need to perform something similar to escape
analysis on the variables. However, there is a case where you can
avoid the allocation without doing escape analysis, and that is when
the splat variables are anonymous:
```ruby
def f(*, **)
g(*, **)
end
```
When splat variables are anonymous, it is not possible to reference
them directly, it is only possible to use them as splats to other
methods. Since that is the case, if `f` is called with a regular
splat and a keyword splat, it can pass the arguments directly to
`g` without copying them, avoiding allocation. For example:
```ruby
def g(a, b:)
a + b
end
def f(*, **)
g(*, **)
end
a = [1]
kw = {b: 2}
f(*a, **kw)
```
I call this technique: Allocationless Anonymous Splat Forwarding.
This is implemented using a couple additional iseq param flags,
anon_rest and anon_kwrest. If anon_rest is set, and an array splat
is passed when calling the method when the array splat can be used
without modification, `setup_parameters_complex` does not duplicate
it. Similarly, if anon_kwest is set, and a keyword splat is passed
when calling the method, `setup_parameters_complex` does not
duplicate it.
This flag is set when the caller has already created a new array to
handle a splat, such as for `f(*a, b)` and `f(*a, *b)`. Previously,
if `f` was defined as `def f(*a)`, these calls would create an extra
array on the callee side, instead of using the new array created
by the caller.
This modifies `setup_args_core` to set the flag whenver it would add
a `splatarray true` instruction. However, when `splatarray true` is
changed to `splatarray false` in the peephole optimizer, to avoid
unnecessary allocations on the caller side, the flag must be removed.
Add `optimize_args_splat_no_copy` and have the peephole optimizer call
that. This significantly simplifies the related peephole optimizer
code.
On the callee side, in `setup_parameters_complex`, set
`args->rest_dupped` to true if the flag is set.
This takes a similar approach for optimizing regular splats that was
previiously used for keyword splats in
d2c41b1bff (via VM_CALL_KW_SPLAT_MUT).
nil is treated similarly to the empty hash in this case, passing
no keywords and not calling any conversion methods.
Fixes [Bug #20064]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
For the following:
```
def f(*a); a end
p f(*a, kw: 3)
```
`setup_parameters_complex` pushes `{kw: 3}` onto `a`. This worked fine
back when `concatarray true` was used and `a` was already a copy. It
does not work correctly with the optimization to switch to
`concatarray false`. This duplicates the array on the callee side
in such a case.
This affects cases when passing a regular splat and a keyword splat
(or literal keywords) in a method call, where the method does not
accept keywords.
This allocation could probably be avoided, but doing so would
make `setup_parameters_complex` more complicated.
Since Ruby 3.0, Ruby has passed a keyword splat as a regular
argument in the case of a call to a Ruby method where the
method does not accept keyword arguments, if the method
call does not contain an argument splat:
```ruby
def self.f(obj) obj end
def self.fs(*obj) obj[0] end
h = {a: 1}
f(**h).equal?(h) # Before: true; After: false
fs(**h).equal?(h) # Before: true; After: false
a = []
f(*a, **h).equal?(h) # Before and After: false
fs(*a, **h).equal?(h) # Before and After: false
```
The fact that the behavior differs when passing an empty
argument splat makes it obvious that something is not
working the way it is intended. Ruby 2 always copied
the keyword splat hash, and that is the expected behavior
in Ruby 3.
This bug is because of a missed check in setup_parameters_complex.
If the keyword splat passed is not mutable, then it points to
an existing object and not a new object, and therefore it must
be copied.
Now, there are 3 specs for the broken behavior of directly
using the keyword splatted hash. Fix two specs and add a
new version guard. Do not keep the specs for the broken
behavior for earlier Ruby versions, in case this fix is
backported. For the ruby2_keywords spec, just remove the
related line, since that line is unrelated to what the
spec is testing.
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Commit e87d088291 introduced a
regression where the keyword splat object passed by the caller
would be directly used by callee as keyword splat parameters,
if it implemented #to_hash. The return value of #to_hash would be
ignored in this case.
Previously, Kernel#lambda returned a non-lambda proc when given a
non-literal block and issued a warning under the `:deprecated` category.
With this change, Kernel#lambda will always return a lambda proc, if it
returns without raising.
Due to interactions with block passing optimizations, we previously had
two separate code paths for detecting whether Kernel#lambda got a
literal block. This change allows us to remove one path, the hack done
with rb_control_frame_t::block_code introduced in 85a337f for supporting
situations where Kernel#lambda returned a non-lambda proc.
[Feature #19777]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Autosplat should not occur if there are two arguments but second
argument is an array containing a ruby2_keywords splat. Only
autosplat if a single argument to be yielded to the block, and there
is no splatted flagged keyword hash passed.
Fixes [Bug #19759]
On `f(*a, **kw)` method calls, a rest keyword parameter is identically
same Hash object is passed and it should make `#dup`ed Hahs.
fix https://bugs.ruby-lang.org/issues/19526
rb_ary_tmp_new suggests that the array is temporary in some way, but
that's not true, it just creates an array that's hidden and not on the
transient heap. This commit renames it to rb_ary_hidden_new.
For a method such as:
def foo(*callee_args) end
If this method is called with a flagged hash (created by a method
flagged with ruby2_keywords), this previously passed the hash
through without modification. With this change, it acts as if the
last hash was passed as keywords, so a call to:
foo(*caller_args)
where the last element of caller_args is a flagged hash, will be
treated as:
foo(*caller_args[0...-1], **caller_args[-1])
As a result, inside foo, callee_args[-1] is an unflagged duplicate
of caller_args[-1] (all other elements of callee_args match
caller_args).
Fixes [Bug #18625]
If the block only accepts a single positional argument plus keywords,
then do not autosplat. Still autosplat if the block accepts more
than one positional argument in addition to keywords.
Autosplatting a single positional argument plus keywords made sense
in Ruby 2, since a final positional hash could be used as keywords,
but it does not make sense in Ruby 3.
Fixes [Bug #18633]
Use ISEQ_BODY macro to get the rb_iseq_constant_body of the ISeq. Using
this macro will make it easier for us to change the allocation strategy
of rb_iseq_constant_body when using Variable Width Allocation.