The main impetus for this change is to fix [Bug #13392]. Previously, we
fired the "return" TracePoint event after popping the stack frame for
the block running as method (BMETHOD). This gave undesirable source
location outputs as the return event normally fires right before the
frame going away.
The iseq for each block can run both as a block and as a method. To
accommodate that, this commit makes vm_trace() fire call/return events for
instructions that have b_call/b_return events attached when the iseq is
running as a BMETHOD. The logic for rewriting to "trace_*" instruction
is tweaked so that when the user listens to call/return events,
instructions with b_call/b_return become trace variants.
To continue to provide the return value for non-local returns done using
the "return" or "break" keyword inside BMETHODs, the stack unwinding
code is tweaked. b_return events now provide the same return value as
return events for these non-local cases. A pre-existing test deemed not
providing a return value for these b_return events as a limitation.
This commit removes the checks for call/return TracePoint events that
happen when calling into BMETHODs when no TracePoints are active.
Technically, migrating just the return event is enough to fix the bug,
but migrating both call and return removes our reliance on
`VM_FRAME_FLAG_FINISH` and re-entering the interpreter when the caller
is already in the interpreter.
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.
This change converts the hacky implementation to a normal struct.
As the ID serial is 32bit value and internal IDs created in the
parser are assigned from its maximum value, Symbol converted from
it will exceed 32bit and overflow on 32bit platforms.
Refinement#import_methods imports methods from modules.
Unlike Module#include, it copies methods and adds them into the refinement,
so the refinement is activated in the imported methods.
[Bug #17429] [ruby-core:101639]
`RubyVM.keep_script_lines` enables to keep script lines
for each ISeq and AST. This feature is for debugger/REPL
support.
```ruby
RubyVM.keep_script_lines = true
RubyVM::keep_script_lines = true
eval("def foo = nil\ndef bar = nil")
pp RubyVM::InstructionSequence.of(method(:foo)).script_lines
```
The interpreter instruction count was enabled based on RUBY_DEBUG as
opposed to YJIT_STATS. In builds with YJIT_STATS=1 but RUBY_DEBUG=0,
the count was not available.
Move YJIT_STATS in yjit.h where declarations are expoed to code outside
of YJIT. Also reduce the changes made to the interpreter for calling
into YJIT's instruction counting function.
env_copy() uses rb_ary_delete_at() with a loop counting up while
iterating through the list of read only locals. rb_ary_delete_at() can
shift elements in the array to an index lesser than the loop index,
causing locals to be missed and set to Qfalse in the returned
environment.
Iterate through the locals in reverse instead, this way the shifting
never happens for locals that are yet to be visited and we process all
the locals in the array.
[Bug #18023]
This fixes issues with paths being loaded twice in certain cases
when symlinks are used.
It took me multiple attempts to get this working. My original
attempt tried to convert paths to realpaths before adding them
to $LOADED_FEATURES. Unfortunately, this doesn't work well
with the loaded feature index, which is based off load paths
and not realpaths. While I was able to get require working, I'm
fairly sure the loaded feature index was not being used as
expected, which would have significant performance implications.
Additionally, I was never able to get that approach working with
autoload when autoloading a non-realpath file. It also broke
some specs.
This takes a more conservative approach. Directly before loading the
file, if the file with the same realpath has been required, the
loading of the file is skipped. The realpaths are stored as
fstrings in a hidden hash.
When rebuilding the loaded feature index, the hash of realpaths
is also rebuilt. I'm guessing this makes rebuilding process
slower, but I don think that is a hot path. In general, modifying
loaded features is only done when reloading, and that tends to be
in non-production environments.
Change test_require_with_loaded_features_pop test to use 30 threads
and 300 iterations, instead of 4 threads and 1000 iterations.
I saw only sporadic failures with 4/1000, but consistent failures
30/300 threads. These failures were due to the fact that the
concurrent deletions from $LOADED_FEATURES in other threads can
result in rb_ary_entry returning nil when rebuilding the loaded
features index.
To avoid concurrency issues when rebuilding the loaded features
index, the building of the index itself is left alone, and
afterwards, a separate loop is done on a copy of the loaded feature
snapshot in order to rebuild the realpaths hash.
Fixes [Bug #17885]
This fixes issues with paths being loaded twice in certain cases
when symlinks are used.
It took me multiple attempts to get this working. My original
attempt tried to convert paths to realpaths before adding them
to $LOADED_FEATURES. Unfortunately, this doesn't work well
with the loaded feature index, which is based off load paths
and not realpaths. While I was able to get require working, I'm
fairly sure the loaded feature index was not being used as
expected, which would have significant performance implications.
Additionally, I was never able to get that approach working with
autoload when autoloading a non-realpath file. It also broke
some specs.
This takes a more conservative approach. Directly before loading the
file, if the file with the same realpath has been required, the
loading of the file is skipped. The realpaths are stored as
fstrings in a hidden hash.
When rebuilding the loaded feature index, the hash of realpaths
is also rebuilt. I'm guessing this makes rebuilding process
slower, but I don think that is a hot path. In general, modifying
loaded features is only done when reloading, and that tends to be
in non-production environments.
Change test_require_with_loaded_features_pop test to use 30 threads
and 300 iterations, instead of 4 threads and 1000 iterations.
I saw only sporadic failures with 4/1000, but consistent failures
30/300 threads. These failures were due to the fact that the
concurrent deletions from $LOADED_FEATURES in other threads can
result in rb_ary_entry returning nil when rebuilding the loaded
features index.
To avoid concurrency issues when rebuilding the loaded features
index, the building of the index itself is left alone, and
afterwards, a separate loop is done on a copy of the loaded feature
snapshot in order to rebuild the realpaths hash.
Fixes [Bug #17885]
[0] => [0, *, a]
#=> [0] length mismatch (given 1, expected 2+) (NoMatchingPatternError)
Ignore test failures of typeprof caused by this change for now.
If the thread termination invokes user code after `th->status` becomes
`THREAD_KILLED`, and the user unblock function causes that `th->status` to
become something else (e.g. `THREAD_RUNNING`), threads waiting in
`thread_join_sleep` will hang forever. We move the unblock function call
to before the thread status is updated, and allow threads to join as soon
as `th->value` becomes defined.
This reverts commit 6505c77501.
`vm_opt_method_table` is me=>bop table to manage the optimized
methods (by specialized instruction). However, `me` can be invalidated
to invalidate the method cache entry.
[Bug #17725]
To solve the issue, use `me-def` instead of `me` which simply copied
at invalidation timing.
A test by @jeremyevans https://github.com/ruby/ruby/pull/4376
This reverts commit 13f8521c63.
20210727T230009Z.fail.html.gz
20210728T000009Z.fail.html.gz
This revert is to confirm whether the commit is the cause.
If the failures consistently occur after this revert, I'll
reintroduce the commit.
VM patch from wanabe.
Test based on example from buzztaiki (Taiki Sugawara).
Test fails when compiles with -DRUBY_DEBUG, as that can
can use rb_bug instead of NoMemoryError, which doesn't
allow testing this case. Test also fails on MingW, as
RangeError is used instead of NoMemoryError. Skip the
test in either case.
Fixes [Bug #15779]
If the thread termination invokes user code after `th->status` becomes
`THREAD_KILLED`, and the user unblock function causes that `th->status` to
become something else (e.g. `THREAD_RUNNING`), threads waiting in
`thread_join_sleep` will hang forever. We move the unblock function call
to before the thread status is updated, and allow threads to join as soon
as `th->value` becomes defined.
Redo of 34a2acdac7 and
931138b006 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
by merging `rb_ast_body_t#line_count` and `#script_lines`.
Fortunately `line_count == RARRAY_LEN(script_lines)` was always
satisfied. When script_lines is saved, it has an array of lines, and
when not saved, it has a Fixnum that represents the old line_count.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
We can take advantage of fstrings to de-duplicate the defined strings.
This means we don't need to keep the list of defined strings on the VM
(or register them as mark objects)