Commit graph

1543 commits

Author SHA1 Message Date
Jean Boussier
9e9f1d9301 Precompute embedded string literals hash code
With embedded strings we often have some space left in the slot, which
we can use to store the string Hash code.

It's probably only worth it for string literals, as they are the ones
likely to be used as hash keys.

We chose to store the Hash code right after the string terminator as to
make it easy/fast to compute, and not require one more union in RString.

```
compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code

|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol      |     39.275M|   39.753M|
|            |           -|     1.01x|
|dyn_symbol  |     37.348M|   37.704M|
|            |           -|     1.01x|
|small_lit   |     29.514M|   33.948M|
|            |           -|     1.15x|
|frozen_lit  |     27.180M|   33.056M|
|            |           -|     1.22x|
|iseq_lit    |     27.391M|   32.242M|
|            |           -|     1.18x|
```

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2024-05-28 07:32:41 +02:00
Nobuyoshi Nakada
f4b475993e
Apply optimizations for putstring to putchilledstring as well 2024-05-27 12:41:38 +09:00
Nobuyoshi Nakada
49fcd33e13 Introduce a specialize instruction for Array#pack
Instructions for this code:

```ruby
  # frozen_string_literal: true

[a].pack("C")
```

Before this commit:

```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself                                                          (   3)[Li]
0001 opt_send_without_block                 <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 newarray                               1
0005 putobject                              "C"
0007 opt_send_without_block                 <calldata!mid:pack, argc:1, ARGS_SIMPLE>
0009 leave
```

After this commit:

```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself                                                          (   3)[Li]
0001 opt_send_without_block                 <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 putobject                              "C"
0005 opt_newarray_send                      2, :pack
0008 leave
```

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2024-05-23 12:11:50 -07:00
Nobuyoshi Nakada
2dd46bb82f
[Bug #20468] Fix safe navigation in for variable 2024-05-16 16:22:17 +09:00
yui-knk
899d9f79dd Rename vast to ast_value
There is an English word "vast".
This commit changes the name to be more clear name to avoid confusion.
2024-05-03 12:40:35 +09:00
HASUMI Hitoshi
55a402bb75 Add line_count field to rb_ast_body_t
This patch adds `int line_count` field to `rb_ast_body_t` structure.
Instead, we no longer cast `script_lines` to Fixnum.

## Background

Ref https://github.com/ruby/ruby/pull/10618

In the PR above, we have decoupled IMEMO from `rb_ast_t`.
This means we could lift the five-words-restriction of the structure
that forced us to unionize `rb_ast_t *` and `FIXNUM` in one field.

## Relating refactor

- Remove the second parameter of `rb_ruby_ast_new()` function

## Attention

I will remove a code that assigns -1 to line_count, in `rb_binding_add_dynavars()`
of vm.c, because I don't think it is necessary.
But I will make another PR for this so that we can atomically revert
in case I was wrong (See the comment on the code)
2024-04-27 12:08:26 +09:00
Kevin Newton
af800bef21 Remove dependency on NODE from coverage structure 2024-04-26 12:25:45 -04:00
HASUMI Hitoshi
2244c58b00 [Universal parser] Decouple IMEMO from rb_ast_t
This patch removes the `VALUE flags` member from the `rb_ast_t` structure making `rb_ast_t` no longer an IMEMO object.

## Background

We are trying to make the Ruby parser generated from parse.y a universal parser that can be used by other implementations such as mruby.
To achieve this, it is necessary to exclude VALUE and IMEMO from parse.y, AST, and NODE.

## Summary (file by file)

- `rubyparser.h`
  - Remove the `VALUE flags` member from `rb_ast_t`
- `ruby_parser.c` and `internal/ruby_parser.h`
  - Use TypedData_Make_Struct VALUE which wraps `rb_ast_t` `in ast_alloc()` so that GC can manage it
    - You can retrieve `rb_ast_t` from the VALUE by `rb_ruby_ast_data_get()`
  - Change the return type of `rb_parser_compile_XXXX()` functions from `rb_ast_t *` to `VALUE`
  - rb_ruby_ast_new() which internally `calls ast_alloc()` is to create VALUE vast outside ruby_parser.c
- `iseq.c` and `vm_core.h`
  - Amend the first parameter of `rb_iseq_new_XXXX()` functions from `rb_ast_body_t *` to `VALUE`
  - This keeps the VALUE of AST on the machine stack to prevent being removed by GC
- `ast.c`
  - Almost all change is replacement `rb_ast_t *ast` with `VALUE vast` (sorry for the big diff)
  - Fix `node_memsize()`
    - Now it includes `rb_ast_local_table_link`, `tokens` and script_lines
- `compile.c`, `load.c`, `node.c`, `parse.y`, `proc.c`, `ruby.c`, `template/prelude.c.tmpl`, `vm.c` and `vm_eval.c`
  - Follow-up due to the above changes
- `imemo.{c|h}`
  - If an object with `imemo_ast` appears, considers it a bug

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-04-26 11:21:08 +09:00
Zack Deveau
9555a997ac ensure ibf_load_setup is only passed String params
In cases where RubyVM::InstructionSequence.load_from_binary() is
passed a param other than a String, we attempt to call the
RSTRING_LENINT macro on it which can cause a segfault.

ex:
```
var_0 = 0
RubyVM::InstructionSequence.load_from_binary(var_0)
```

This commit adds a type check to raise unless we are provided
a String.
2024-04-20 10:41:01 +09:00
Koichi Sasada
662ce928a7 RUBY_TRY_UNUSED_BLOCK_WARNING_STRICT
`RUBY_TRY_UNUSED_BLOCK_WARNING_STRICT=1 ruby ...` will enable
strict check for unused block warning.

This option is only for trial to compare the results so the
envname is not considered well.
Should be removed before Ruby 3.4.0 release.
2024-04-19 14:28:54 +09:00
Koichi Sasada
e9d7478ded relax unused block warning for duck typing
if a method `foo` uses a block, other (unrelated) method `foo`
can receives a block. So try to relax the unused block warning
condition.

```ruby
      class C0
        def f = yield
      end

      class C1 < C0
        def f = nil
      end

      [C0, C1].f{ block } # do not warn
```
2024-04-17 20:26:49 +09:00
Koichi Sasada
f9f3018001 ISeq#to_a respects use_block status
```ruby
b = RubyVM::InstructionSequence.compile('def f = yield; def g = nil').to_a
pp b

 #=>
 ...
 {:use_block=>true},
 ...
```
2024-04-17 17:03:46 +09:00
Jean Boussier
f06670c5a2 Eliminate usage of OBJ_FREEZE_RAW
Previously it would bypass the `FL_ABLE` check, but
since shapes introduction, it started having a different
behavior than `OBJ_FREEZE`, as it would onyl set the `FL_FREEZE`
flag, but not update the shape.

I have no indication of this causing a bug yet, but it seems
like a trap waiting to happen.
2024-04-16 17:20:35 +02:00
HASUMI Hitoshi
9b1e97b211 [Universal parser] DeVALUE of p->debug_lines and ast->body.script_lines
This patch is part of universal parser work.

## Summary
- Decouple VALUE from members below:
  - `(struct parser_params *)->debug_lines`
  - `(rb_ast_t *)->body.script_lines`
- Instead, they are now `rb_parser_ary_t *`
  - They can also be a `(VALUE)FIXNUM` as before to hold line count
- `ISEQ_BODY(iseq)->variable.script_lines` remains VALUE
  - In order to do this,
  - Add `VALUE script_lines` param to `rb_iseq_new_with_opt()`
  - Introduce `rb_parser_build_script_lines_from()` to convert `rb_parser_ary_t *` into `VALUE`

## Other details
- Extend `rb_parser_ary_t *`. It previously could only store `rb_parser_ast_token *`, now can store script_lines, too
- Change tactics of building the top-level `SCRIPT_LINES__` in `yycompile0()`
  - Before: While parsing, each line of the script is added to `SCRIPT_LINES__[path]`
  - After: After `yyparse(p)`, `SCRIPT_LINES__[path]` will be built from `p->debug_lines`
- Remove the second parameter of `rb_parser_set_script_lines()` to make it simple
- Introduce `script_lines_free()` to be called from `rb_ast_free()` because the GC no longer takes care of the script_lines
- Introduce `rb_parser_string_deep_copy()` in parse.y to maintain script_lines when `rb_ruby_parser_free()` called
  - With regard to this, please see *Future tasks* below

## Future tasks
- Decouple IMEMO from `rb_ast_t *`
  - This lifts the five-members-restriction of Ruby object,
  - So we will be able to move the ownership of the `lex.string_buffer` from parser to AST
  - Then we remove `rb_parser_string_deep_copy()` to make the whole thing simple
2024-04-15 20:51:54 +09:00
Koichi Sasada
9a57b04703 super{} doesn't use block
`super(){}`, `super{}` and `super(&b)` doesn't use the given
block so warn unused block warning when calling a method which
doesn't use block with above `super` expressions.

e.g.: `def f = super{B1}` (warn on `f{B2}` because `B2` is not used.
2024-04-15 17:56:49 +09:00
Koichi Sasada
145cced9bc fix incorrect warning.
`super()` (not zsuper) passes the passed block and
it can be used.

```ruby
class C0
  def foo; yield; end
end

class C1 < C0
  def foo; super(); end
end

C1.new.foo{p :block} #=> :block
```
2024-04-15 14:53:41 +09:00
Koichi Sasada
9180e33ca3 show warning for unused block
With verbopse mode (-w), the interpreter shows a warning if
a block is passed to a method which does not use the given block.

Warning on:

* the invoked method is written in C
* the invoked method is not `initialize`
* not invoked with `super`
* the first time on the call-site with the invoked method
  (`obj.foo{}` will be warned once if `foo` is same method)

[Feature #15554]

`Primitive.attr! :use_block` is introduced to declare that primitive
functions (written in C) will use passed block.

For minitest, test needs some tweak, so use
ea9caafc07
for `test-bundled-gems`.
2024-04-15 12:08:07 +09:00
Jean Boussier
1b830740ba compile.c: use rb_enc_interned_str to reduce allocations
The `rb_fstring(rb_enc_str_new())` pattern is inneficient because:

- It passes a mutable string to `rb_fstring` so if it has to be interned
  it will first be duped.
- It an equivalent interned string already exists, we allocated the string
  for nothing.

With `rb_enc_interned_str` we either directly get the pre-existing string
with 0 allocations, or efficiently directly intern the one we create
without first duping it.
2024-04-11 09:04:31 +02:00
Jeremy Evans
ad90fdd24c Remove compiler code to handle blocks in attrasgn
Passing blocks is no longer allowed in attrasgn.  This is similar
to 3a674c9c65, but for attrasgn instead
of op_asgn.
2024-04-06 10:33:16 -07:00
yui-knk
f022a700bf Remove imemo type check for NODE
In the past, `rb_iseq_compile_node` received `NODE *`
and `struct vm_ifunc *` as `node`. But after e743a35,
the function only receives `NODE *`.
This commit removes imemo type check to reduce the dependence
on `VALUE flags` of `struct RNode`.
2024-04-06 18:20:31 +09:00
Jeremy Evans
3a674c9c65 Remove compiler code to handle keywords and blocks in operator assignment syntax
Code such as:

```ruby
foo[0, &bar] = baz
foo[0, bar: 1] = baz
foo[0, **bar] = baz
```

Is now a syntax error, so all of the removed code is now dead.
2024-04-04 17:13:40 -07:00
HASUMI Hitoshi
8aa8fce320 Fix return-type warning in compile.c
This patch surppresses the warning below:

```console
compile.c:10314:1: warning: control reaches end of non-void function [-Wreturn-type]
10314 | }
      | ^
```
2024-04-04 13:38:26 +09:00
yui-knk
f057741c5d NODE_LIT is not used anymore 2024-04-04 13:17:26 +09:00
yui-knk
6056773105 Move shareable_constant_value logic from parse.y to compile.c 2024-04-04 08:44:10 +09:00
KJ Tsanaktsidis
9d0a5148ae Add missing RB_GC_GUARDs related to DATA_PTR
I discovered the problem in `compile.c` from a failing
TestIseqLoad#test_stressful_roundtrip test with ASAN enabled. The other
two changes in array.c and string.c I found by auditing similar usages
of DATA_PTR in the codebase.

[Bug #20402]
2024-03-31 20:33:38 +11:00
Maxime Chevalier-Boisvert
bb3cbdfe2f
YJIT: add iseq_alloc_count to stats (#10398)
* YJIT: add iseq_alloc_count to stats

* Remove an empty line

---------

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2024-03-28 15:21:09 -04:00
Étienne Barrié
12be40ae6b Implement chilled strings
[Feature #20205]

As a path toward enabling frozen string literals by default in the future,
this commit introduce "chilled strings". From a user perspective chilled
strings pretend to be frozen, but on the first attempt to mutate them,
they lose their frozen status and emit a warning rather than to raise a
`FrozenError`.

Implementation wise, `rb_compile_option_struct.frozen_string_literal` is
no longer a boolean but a tri-state of `enabled/disabled/unset`.

When code is compiled with frozen string literals neither explictly enabled
or disabled, string literals are compiled with a new `putchilledstring`
instruction. This instruction is identical to `putstring` except it marks
the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags.

Chilled strings have the `FL_FREEZE` flag as to minimize the need to check
for chilled strings across the codebase, and to improve compatibility with
C extensions.

Notes:
  - `String#freeze`: clears the chilled flag.
  - `String#-@`: acts as if the string was mutable.
  - `String#+@`: acts as if the string was mutable.
  - `String#clone`: copies the chilled flag.

Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
2024-03-19 09:26:49 +01:00
Jeremy Evans
815c7e197c Avoid caller-side hash allocation for f(*a, kw: 1) and f(*a, kw: 1, &block)
Previously, this used:

```
splatarray false
duphash
getlocal/getblockparamproxy # in the block passing case
send ARGS_SPLAT|KW_SPLAT|KW_SPLAT_MUT
```

This changes the duphash to putobject, with putobject using
a frozen version of the hash, and removing the keyword mutability:

```
splatarray false
putobject
getlocal/getblockparamproxy # in the block passing case
send ARGS_SPLAT|KW_SPLAT
```
2024-03-16 09:27:32 -07:00
Jean Boussier
91bf7eb274 Refactor frozen_string_literal check during compilation
In preparation for https://bugs.ruby-lang.org/issues/20205.

The `frozen_string_literal` compilation option will no longer
be a boolean but a tri-state: `on/off/default`.
2024-03-15 15:52:33 +01:00
Jeremy Evans
c388784943 Fix array allocation optimization for f(*a, kw: 1)
This was broken during the refactoring in 22e488464a.
2024-03-14 16:11:35 -07:00
Jeremy Evans
6ea01d204e
Fix dump of hidden local variable indexes
This fixes test failures when running tests with
RUBY_ISEQ_DUMP_DEBUG=to_binary, which started after
5899f6aa55 was committed.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-03-06 09:10:00 -08:00
Jeremy Evans
5899f6aa55 Keep hidden local variables when dumping and loading iseqs
Fixes [Bug #19975]
2024-03-04 09:49:55 -08:00
S-H-GAMELINKS
2d8788e90c Support NODE_ONCE for pattern matching 2024-03-04 12:33:00 +09:00
Jeremy Evans
99384bac28 Correctly set anon_kwrest flag for def f(b: 1, **)
In cases where a method accepts both keywords and an anonymous
keyword splat, the method was not marked as taking an anonymous
keyword splat.  Fix that in the compiler.

Doing that broke handling of nil keyword splats in yjit, so
update yjit to handle that.

Add a test to check that calling a method that accepts both
a keyword argument and an anonymous keyword splat does not
modify a passed keyword splat hash.

Move the anon_kwrest check from setup_parameters_complex to
ignore_keyword_hash_p, and only use it if the keyword hash
is already a hash. This should speed things up slightly as
it avoids a check previously used for all callers of
setup_parameters_complex.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-03-01 12:36:19 -08:00
Jeremy Evans
334e4c65b3 Fix a couple issues noticed by nobu
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-03-01 07:10:25 -08:00
Jeremy Evans
73371450c3 Avoid 1-2 array allocations for zsuper calls with post arguments
These previously resulted in 2 array allocations, one for newarray
and one for concatarray.  This replaces newarray and concatarray
with pushtoarray, and changes splatarray false to splatarray true,
which reduces it to 1 array allocation, in splatarray true.

This also sets VM_CALL_ARGS_SPLAT_MUT, so if the super method
accepts a positional splat, this will avoid an additional array
allocation on the callee side.
2024-03-01 07:10:25 -08:00
Jeremy Evans
32c58753af Fix splatarray false peephole optimization for f(*ary, **kw, &block)
This optimization stopped being using when the splatkw VM instruction
was added.  This change allows the optimization to apply again. This
also optimizes the following cases:

  super(*ary, **kw, &block)
  f(...)
  super(...)
2024-03-01 07:10:25 -08:00
Jeremy Evans
e484ffaf20 Perform splatarray false peephole optimization for invokesuper in addition to send
This optimizes cases such as:

  super(arg, *ary)
  super(arg, *ary, &block)
  super(*ary, **kw)
  super(*ary, kw: 1)
  super(*ary, kw: 1, &block)

The super(*ary, **kw, &block) case does not use the splatarray false
optimization.  This is also true of the send case, since the
introduction of the splatkw VM instruction.  That will be fixed in
a later commit.
2024-03-01 07:10:25 -08:00
Kevin Newton
0f1ca9492c [PRISM] Provide runtime flag for prism in iseq 2024-02-21 11:44:40 -05:00
Alan Wu
2a6917b463 Fix string value in hash literal being forced frozen
We should pass `false` for `hash_key` for value nodes. Credits to
`@kddnewton` for noticing and bisecting.
2024-02-20 21:00:54 -05:00
yui-knk
e7ab5d891c Introduce NODE_REGX to manage regexp literal 2024-02-21 08:06:48 +09:00
Jeremy Evans
77c1233f79 Add pushtoarraykwsplat instruction to avoid unnecessary array allocation
This is designed to replace the newarraykwsplat instruction, which is
no longer used in the parse.y compiler after this commit.  This avoids
an unnecessary array allocation in the case where ARGSCAT is followed
by LIST with keyword:

```ruby
a = []
kw = {}
[*a, 1, **kw]
```

Previous Instructions:

```
0000 newarray                               0                         (   1)[Li]
0002 setlocal_WC_0                          a@0
0004 newhash                                0                         (   2)[Li]
0006 setlocal_WC_0                          kw@1
0008 getlocal_WC_0                          a@0                       (   3)[Li]
0010 splatarray                             true
0012 putobject_INT2FIX_1_
0013 putspecialobject                       1
0015 newhash                                0
0017 getlocal_WC_0                          kw@1
0019 opt_send_without_block                 <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>
0021 newarraykwsplat                        2
0023 concattoarray
0024 leave
```

New Instructions:

```
0000 newarray                               0                         (   1)[Li]
0002 setlocal_WC_0                          a@0
0004 newhash                                0                         (   2)[Li]
0006 setlocal_WC_0                          kw@1
0008 getlocal_WC_0                          a@0                       (   3)[Li]
0010 splatarray                             true
0012 putobject_INT2FIX_1_
0013 pushtoarray                            1
0015 putspecialobject                       1
0017 newhash                                0
0019 getlocal_WC_0                          kw@1
0021 opt_send_without_block                 <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>
0023 pushtoarraykwsplat
0024 leave
```

pushtoarraykwsplat is designed to be simpler than newarraykwsplat.
It does not take a variable number of arguments from the stack, it
pops the top of the stack, and appends it to the second from the top,
unless the top of the stack is an empty hash.

During this work, I found the ARGSPUSH followed by HASH with keyword
did not compile correctly, as it pushed the generated hash to the
array even if the hash was empty.  This fixes the behavior, to use
pushtoarraykwsplat instead of pushtoarray in that case:

```ruby
a = []
kw = {}
[*a, **kw]

[{}] # Before

[] # After
```

This does not remove the newarraykwsplat instruction, as it is still
referenced in the prism compiler (which should be updated similar
to this), YJIT (only in the bindings, it does not appear to be
implemented), and RJIT (in a couple comments).  After those are
updated, the newarraykwsplat instruction should be removed.
2024-02-20 10:47:44 -08:00
Peter Zhu
2967b7eb76 GC guard catch_table_ary
Using RARRAY_CONST_PTR can cause the array object to not exist on the
stack, which could cause it to be GC'd or be moved by GC compaction. This
can cause RARRAY_CONST_PTR to point to the incorrect location if the
array is embedded and moved by GC compaction.

Fixes ruby/prism#2444.
2024-02-16 15:58:39 -05:00
Yusuke Endoh
25d74b9527 Do not include a backtick in error messages and backtraces
[Feature #16495]
2024-02-15 18:42:31 +09:00
Peter Zhu
de7a29ef8d Replace assert with RUBY_ASSERT in compile.c
assert does not print the bug report, only the file and line number of
the assertion that failed. RUBY_ASSERT prints the full bug report, which
makes it much easier to debug.
2024-02-12 15:07:47 -05:00
Kevin Newton
f7467e70e1 Split line_no and node_id before new_insn_body
Before this commit, there were many places where we had to generate
dummy line nodes to hold both the line number and the node id that
would then immediately get pulled out from the created node. Now
we pass them explicitly so that we don't have to generate these
nodes.

This makes a clearer line between the parser and compiler, and also
makes it easier to generate instructions when we don't have a
specific node to tie them to. As such, it removes almost every
single place where we needed to previously generate dummy nodes.

This also makes it easier for the prism compiler, because now we
can pass in line number and node id instead of trying to generate
dummy nodes for every instruction that we compile.
2024-02-09 17:01:27 -05:00
yui-knk
33c1e082d0 Remove ruby object from string nodes
String nodes holds ruby string object on `VALUE nd_lit`.
This commit changes it to `struct rb_parser_string *string`
to reduce dependency on ruby object.
Sometimes these strings are concatenated with other string
therefore string concatenate functions are needed.
2024-02-09 14:20:17 +09:00
Kevin Newton
610636fd6b [PRISM] Mirror iseq APIs
Before this commit, we were mixing a lot of concerns with the prism
compile between RubyVM::InstructionSequence and the general entry
points to the prism parser/compiler.

This commit makes all of the various prism-related APIs mirror
their corresponding APIs in the existing parser/compiler. This means
we now have the correct frame naming, and it's much easier to follow
where the logic actually flows. Furthermore this consolidates a lot
of the prism initialization, making it easier to see where we could
potentially be raising errors.
2024-01-31 13:41:36 -05:00
Jeremy Evans
20732cadfd Make compile_array first_chunk argument bool instead of int 2024-01-30 08:47:48 -08:00
Jeremy Evans
332e0db675 Avoid unnecessary array allocation for ARGSCAT with LIST body
Previously, this would use newarray followed by concattoarray.
This now uses pushtoarray instead, avoiding the unnecessary
array allocation.

This is implemented by making compile_array take a first_chunk
argument, passing in 1 in the normal array case, and 0 in the
ARGSCAT with LIST body case.
2024-01-30 08:47:48 -08:00