The macro provided by symbol.h uses STATIC_ID2SYM
when it can which speeds up methods that declare keyword args.
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Add opt_duparray_send insn to skip the allocation on `#include?`
If the method isn't going to modify the array we don't need to copy it.
This avoids the allocation / array copy for things like `[:a, :b].include?(x)`.
This adds a BOP for include? and tracks redefinition for it on Array.
Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
* YJIT: Implement opt_duparray_send include_p
Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
* Update opt_newarray_send to support simple forms of include?(arg)
Similar to opt_duparray_send but for non-static arrays.
* YJIT: Implement opt_newarray_send include_p
---------
Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>
When we run with RUBY_FREE_AT_EXIT, there's a false-positive memory leak
reported in YJIT because the METHOD_CODEGEN_TABLE is never freed. This
commit adds rb_yjit_free_at_exit that is called at shutdown when
RUBY_FREE_AT_EXIT is set.
Reported memory leak:
==699816== 1,104 bytes in 1 blocks are possibly lost in loss record 1 of 1
==699816== at 0x484680F: malloc (vg_replace_malloc.c:446)
==699816== by 0x155B3E: UnknownInlinedFun (unix.rs:14)
==699816== by 0x155B3E: UnknownInlinedFun (stats.rs:36)
==699816== by 0x155B3E: UnknownInlinedFun (stats.rs:27)
==699816== by 0x155B3E: alloc (alloc.rs:98)
==699816== by 0x155B3E: alloc_impl (alloc.rs:181)
==699816== by 0x155B3E: allocate (alloc.rs:241)
==699816== by 0x155B3E: do_alloc<alloc::alloc::Global> (alloc.rs:15)
==699816== by 0x155B3E: new_uninitialized<alloc::alloc::Global> (mod.rs:1750)
==699816== by 0x155B3E: fallible_with_capacity<alloc::alloc::Global> (mod.rs:1788)
==699816== by 0x155B3E: prepare_resize<alloc::alloc::Global> (mod.rs:2864)
==699816== by 0x155B3E: resize_inner<alloc::alloc::Global> (mod.rs:3060)
==699816== by 0x155B3E: reserve_rehash_inner<alloc::alloc::Global> (mod.rs:2950)
==699816== by 0x155B3E: hashbrown::raw::RawTable<T,A>::reserve_rehash (mod.rs:1231)
==699816== by 0x5BC39F: UnknownInlinedFun (mod.rs:1179)
==699816== by 0x5BC39F: find_or_find_insert_slot<(usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool), alloc::alloc::Global, hashbrown::map::equivalent_key::{closure_env#0}<usize, usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool>, hashbrown::map::make_hasher::{closure_env#0}<usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool, std:#️⃣:random::RandomState>> (mod.rs:1413)
==699816== by 0x5BC39F: hashbrown::map::HashMap<K,V,S,A>::insert (map.rs:1754)
==699816== by 0x57C5C6: insert<usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, *const yjit::cruby::autogened::rb_callinfo, *const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool, std:#️⃣:random::RandomState> (map.rs:1104)
==699816== by 0x57C5C6: yjit::codegen::reg_method_codegen (codegen.rs:10521)
==699816== by 0x57C295: yjit::codegen::yjit_reg_method_codegen_fns (codegen.rs:10464)
==699816== by 0x5C6B07: rb_yjit_init (yjit.rs:40)
==699816== by 0x393723: ruby_opt_init (ruby.c:1820)
==699816== by 0x393723: ruby_opt_init (ruby.c:1767)
==699816== by 0x3957D4: prism_script (ruby.c:2215)
==699816== by 0x3957D4: process_options (ruby.c:2538)
==699816== by 0x396065: ruby_process_options (ruby.c:3166)
==699816== by 0x236E56: ruby_options (eval.c:117)
==699816== by 0x15BAED: rb_main (main.c:43)
==699816== by 0x15BAED: main (main.c:62)
After this patch, there are no more memory leaks reported when running
RUBY_FREE_AT_EXIT with Valgrind on an empty Ruby script:
$ RUBY_FREE_AT_EXIT=1 valgrind --leak-check=full ruby -e ""
...
==700357== HEAP SUMMARY:
==700357== in use at exit: 0 bytes in 0 blocks
==700357== total heap usage: 36,559 allocs, 36,559 frees, 6,064,783 bytes allocated
==700357==
==700357== All heap blocks were freed -- no leaks are possible
Many libraries should be loaded on the main ractor because of
setting constants with unshareable objects and so on.
This patch allows to call `requore` on non-main Ractors by
asking the main ractor to call `require` on it. The calling ractor
waits for the result of `require` from the main ractor.
If the `require` call failed with some reasons, an exception
objects will be deliverred from the main ractor to the calling ractor
if it is copy-able.
Same on `require_relative` and `require` by `autoload`.
Now `Ractor.new{pp obj}` works well (the first call of `pp` requires
`pp` library implicitly).
[Feature #20627]
to show unused block warning strictly.
```ruby
class C
def f = nil
end
class D
def f = yield
end
[C.new, D.new].each{|obj| obj.f{}}
```
In this case, `D#f` accepts a block. However `C#f` doesn't
accept a block. There are some cases passing a block with
`obj.f{}` where `obj` is `C` or `D`. To avoid warnings on
such cases, "unused block warning" will be warned only if
there is not same name which accepts a block.
On the above example, `C.new.f{}` doesn't show any warnings
because there is a same name `D#f` which accepts a block.
We call this default behavior as "relax mode".
`strict_unused_block` new warning category changes from
"relax mode" to "strict mode", we don't check same name
methods and `C.new.f{}` will be warned.
[Feature #15554]
* YJIT: Replace Array#each only when YJIT is enabled
* Add comments about BUILTIN_ATTR_C_TRACE
* Make Ruby Array#each available with --yjit as well
* Fix all paths that expect a C location
* Use method_basic_definition_p to detect patches
* Copy a comment about C_TRACE flag to compilers
* Rephrase a comment about add_yjit_hook
* Give METHOD_ENTRY_BASIC flag to Array#each
* Add --yjit-c-builtin option
* Allow inconsistent source_location in test-spec
* Refactor a check of BUILTIN_ATTR_C_TRACE
* Set METHOD_ENTRY_BASIC without touching vm->running
If a Hash which is empty or only using literals is frozen, we detect
this as a peephole optimization and change the instructions to be
`opt_hash_freeze`.
[Feature #20684]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
If an Array which is empty or only using literals is frozen, we detect
this as a peephole optimization and change the instructions to be
`opt_ary_freeze`.
[Feature #20684]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
Unlike in older revisions in the year, GCC 11 isn't inlining the call
to vm_push_frame() inside invoke_iseq_block_from_c() anymore. We do
want it to be inlined since rb_yield() speed is fairly important.
Logs from -fopt-info-optimized-inline reveal that GCC was blowing its
code size budget inlining invoke_block_from_c_bh() into its various
callers, leaving suboptimal code for its body.
Take away some uses of the `inline` keyword and merge a common tail
call to vm_exec() for overall better code.
This tweak gives about 18% on a micro benchmark and 1% on the
chunky-png benchmark from yjit-bench. I tested on a Skylake server.
```
$ cat c-to-ruby-call.yml
benchmark:
- 0.upto(10_000_000) {}
$ benchmark-driver --chruby '+patch;master' c-to-ruby-call.yml
Warming up --------------------------------------
0.upto(10_000_000) {} 2.299 i/s - 3.000 times in 1.304689s (434.90ms/i)
Calculating -------------------------------------
+patch master
0.upto(10_000_000) {} 2.299 1.943 i/s - 6.000 times in 2.609393s 3.088353s
Comparison:
0.upto(10_000_000) {}
+patch: 2.3 i/s
master: 1.9 i/s - 1.18x slower
$ ruby run_benchmarks.rb --chruby 'master;+patch' chunky-png
<snip>
---------- ----------- ---------- ----------- ---------- -------------- -------------
bench master (ms) stddev (%) +patch (ms) stddev (%) +patch 1st itr master/+patch
chunky-png 1156.1 0.1 1142.2 0.2 1.01 1.01
---------- ----------- ---------- ----------- ---------- -------------- -------------
```
28a1c4f33e seems to call an improper
ensure clause. [Bug #20655]
Than fixing it properly, I bet it would be much better to simply revert
that commit. It reduces the unneeded complexity. Jumping into a block
called by a C function like Hash#each with callcc is user's fault.
It does not need serious support.
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.
Calls it optimizes look like this:
```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```
```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```
```ruby
def bar(*a) = a
def foo(...)
list = [1, 2]
bar(*list, ...) # optimized
end
foo(123)
```
All variants of the above but using `super` are also optimized, including a bare super like this:
```ruby
def foo(...)
super
end
```
This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:
```ruby
def m
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def bar(a) = a
def foo(...) = bar(...)
def test
m { foo(123) }
end
test
p test # allocates 1 object on master, but 0 objects with this patch
```
```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)
def test
m { foo(1, b: 2) }
end
test
p test # allocates 2 objects on master, but 0 objects with this patch
```
How does it work?
-----------------
This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.
I think this description is kind of confusing, so let's walk through an example with code.
```ruby
def delegatee(a, b) = a + b
def delegator(...)
delegatee(...) # CI2 (FORWARDING)
end
def caller
delegator(1, 2) # CI1 (argc: 2)
end
```
Before we call the delegator method, the stack looks like this:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # |
5| delegatee(...) # CI2 (FORWARDING) |
6| end |
7| |
8| def caller |
-> 9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method. The `delegator` method has a special local called `...`
that holds the caller's CI object.
Here is the ISeq disasm fo `delegator`:
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
The local called `...` will contain the caller's CI: CI1.
Here is the stack when we enter `delegator`:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
-> 4| # | CI1 (argc: 2)
5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller |
9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`. In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).
Before executing the `send` instruction, we push `...` on the stack. The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
Instruction 001 puts the caller's CI on the stack. `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # | CI1 (argc: 2)
-> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller | self
9| delegator(1, 2) # CI1 (argc: 2) | 1
10| end | 2
```
The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.
Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.
I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.
I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.
For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:
```ruby
SomeObject.new(foo: 1)
```
If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
This patch adds `int line_count` field to `rb_ast_body_t` structure.
Instead, we no longer cast `script_lines` to Fixnum.
## Background
Ref https://github.com/ruby/ruby/pull/10618
In the PR above, we have decoupled IMEMO from `rb_ast_t`.
This means we could lift the five-words-restriction of the structure
that forced us to unionize `rb_ast_t *` and `FIXNUM` in one field.
## Relating refactor
- Remove the second parameter of `rb_ruby_ast_new()` function
## Attention
I will remove a code that assigns -1 to line_count, in `rb_binding_add_dynavars()`
of vm.c, because I don't think it is necessary.
But I will make another PR for this so that we can atomically revert
in case I was wrong (See the comment on the code)
This patch removes the `VALUE flags` member from the `rb_ast_t` structure making `rb_ast_t` no longer an IMEMO object.
## Background
We are trying to make the Ruby parser generated from parse.y a universal parser that can be used by other implementations such as mruby.
To achieve this, it is necessary to exclude VALUE and IMEMO from parse.y, AST, and NODE.
## Summary (file by file)
- `rubyparser.h`
- Remove the `VALUE flags` member from `rb_ast_t`
- `ruby_parser.c` and `internal/ruby_parser.h`
- Use TypedData_Make_Struct VALUE which wraps `rb_ast_t` `in ast_alloc()` so that GC can manage it
- You can retrieve `rb_ast_t` from the VALUE by `rb_ruby_ast_data_get()`
- Change the return type of `rb_parser_compile_XXXX()` functions from `rb_ast_t *` to `VALUE`
- rb_ruby_ast_new() which internally `calls ast_alloc()` is to create VALUE vast outside ruby_parser.c
- `iseq.c` and `vm_core.h`
- Amend the first parameter of `rb_iseq_new_XXXX()` functions from `rb_ast_body_t *` to `VALUE`
- This keeps the VALUE of AST on the machine stack to prevent being removed by GC
- `ast.c`
- Almost all change is replacement `rb_ast_t *ast` with `VALUE vast` (sorry for the big diff)
- Fix `node_memsize()`
- Now it includes `rb_ast_local_table_link`, `tokens` and script_lines
- `compile.c`, `load.c`, `node.c`, `parse.y`, `proc.c`, `ruby.c`, `template/prelude.c.tmpl`, `vm.c` and `vm_eval.c`
- Follow-up due to the above changes
- `imemo.{c|h}`
- If an object with `imemo_ast` appears, considers it a bug
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
* Revert "Revert "YJIT: Optimize local variables when EP == BP" (#10584)"
This reverts commit c878344195.
* YJIT: Take care of GC references in ISEQ invariants
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
---------
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
This reverts commit 4cc58ea0b8.
Since the change landed call-threshold=1 CI runs have been timing out.
There has also been `verify-ctx` violations. Revert for now while we debug.
`RUBY_TRY_UNUSED_BLOCK_WARNING_STRICT=1 ruby ...` will enable
strict check for unused block warning.
This option is only for trial to compare the results so the
envname is not considered well.
Should be removed before Ruby 3.4.0 release.
if a method `foo` uses a block, other (unrelated) method `foo`
can receives a block. So try to relax the unused block warning
condition.
```ruby
class C0
def f = yield
end
class C1 < C0
def f = nil
end
[C0, C1].f{ block } # do not warn
```
This makes it easier to notice a dependency is causing interpreter or
JIT deoptimization.
```ruby
Warning[:performance] = true
class String
def freeze
super
end
end
```
```
./test.rb:4: warning: Redefining 'String#freeze' disable multiple interpreter and JIT optimizations
```
This patch is part of universal parser work.
## Summary
- Decouple VALUE from members below:
- `(struct parser_params *)->debug_lines`
- `(rb_ast_t *)->body.script_lines`
- Instead, they are now `rb_parser_ary_t *`
- They can also be a `(VALUE)FIXNUM` as before to hold line count
- `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE
- In order to do this,
- Add `VALUE script_lines` param to `rb_iseq_new_with_opt()`
- Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t *` into `VALUE`
## Other details
- Extend `rb_parser_ary_t *`. It previously could only store `rb_parser_ast_token *`, now can store script_lines, too
- Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()`
- Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]`
- After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines`
- Remove the second parameter of `rb_parser_set_script_lines()` to make it simple
- Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines
- Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called
- With regard to this, please see *Future tasks* below
## Future tasks
- Decouple IMEMO from `rb_ast_t *`
- This lifts the five-members-restriction of Ruby object,
- So we will be able to move the ownership of the `lex.string_buffer` from parser to AST
- Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple
Currently, we check the values on the machine stack & register state to
see if they're actually a pointer to an ASAN fake stack, and mark the
values on the fake stack too if required. However, we are only doing
that for the _current_ thread (the one actually running the GC), not for
any other thread in the program.
Make rb_gc_mark_machine_context (which is called for marking non-current
threads) perform the same ASAN fake stack handling that
mark_current_machine_context performs.
[Bug #20310]
Lately there has been a few flaky YJIT CI failures where a new Ruby
thread is finding the canary on the VM stack. For example:
2267950848 (step):14:109
After checking a local rr recording, it's clear that the canary was
written there when YJIT was using a temporary malloc region, and then
later handed to the new Ruby thread. Previously, the VM stack was
uninitialized, so it can have stale values in it, like the canary.
Though unlikely, this can happen without YJIT too. Initialize the stack
if we're spawning canaries.
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm->mark_object_ary` already
does both much more efficiently.
Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.
So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.
`vm->mark_object_ary` is also being refactored.
Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.
This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.
But using a custom TypedData we can save from having to mark
all the references on minor GC runs.
Addtionally, immediate values are now ignored and not appended
to `vm->mark_object_ary` as it's just wasted space.
This frees FL_USER0 on both T_MODULE and T_CLASS.
Note: prior to this, FL_SINGLETON was never set on T_MODULE,
so checking for `FL_SINGLETON` without first checking that
`FL_TYPE` was `T_CLASS` was valid. That's no longer the case.
Rather than exposing that an imemo has a flag and four fields, this
changes the implementation to only expose one field (the klass) and
fills the rest with 0. The type will have to fill in the values themselves.
Previously every call to vm_ci_new (when the CI was not packable) would
result in a different callinfo being returned this meant that every
kwarg callsite had its own CI.
When calling, different CIs result in different CCs. These CIs and CCs
both end up persisted on the T_CLASS inside cc_tbl. So in an eval loop
this resulted in a memory leak of both types of object. This also likely
resulted in extra memory used, and extra time searching, in non-eval
cases.
For simplicity in this commit I always allocate a CI object inside
rb_vm_ci_lookup, but ideally we would lazily allocate it only when
needed. I hope to do that as a follow up in the future.