Commit graph

2613 commits

Author SHA1 Message Date
ydah
169a5ee99e Use user defined inline rules user_or_keyword_variable 2024-10-01 23:59:58 +09:00
Nobuyoshi Nakada
86ae409467
[Bug #20764] Refactor argument forwarding in lambda
Reject argument forwarding in lambda:
- without parentheses
- after optional argument(s)
2024-10-01 20:00:22 +09:00
ydah
ac2786757e Use Named Reference 2024-09-30 18:04:41 +09:00
ydah
044e57ed7c Implement SPLAT NODE keyword locations 2024-09-30 18:04:41 +09:00
tompng
b9e225fcbf Allow dot3 in defs singleton 2024-09-28 22:37:44 +09:00
ydah
8f678d6989 Implement OP_ASGN2 NODE locations 2024-09-28 20:53:09 +09:00
Nobuyoshi Nakada
7e19904c88 Remove on RSTRING_END dependency from parser 2024-09-28 01:59:33 +09:00
Nobuyoshi Nakada
94ad2c3fe9 Reduce creating rb_parser_string_t repeatedly for literals.
Since #11698, `parser_str_new` makes `rb_parser_string_t` and `VALUE`
but discards the former, and then `STR_NEW3` makes the same thing
again.
2024-09-27 23:10:14 +09:00
Nobuyoshi Nakada
710d916c32 Add wrapper macros of rb_parser_str_buf_cat 2024-09-27 23:10:14 +09:00
S-H-GAMELINKS
7f83bd3732 Reduce is_ascii_string function dependency for parser
Changed to use `rb_parser_is_ascii_string` function instead of `is_ascii_string` function
2024-09-27 19:34:35 +09:00
ydah
eff16d9302 Implement OP_ASGN1 NODE locations 2024-09-27 18:20:00 +09:00
Nobuyoshi Nakada
80e483afac
Fold rules [ci skip] 2024-09-26 06:05:35 +09:00
Peter Zhu
407f8b8716 Fix memory leak in Ripper for indented heredocs
The allocated parser string is never freed, which causes a memory leak.

The following code leaks memory:

    Ripper.sexp_raw(DATA.read)

    __END__
    <<~EOF
      a
        #{1}
      a
    EOF
2024-09-25 08:56:14 -04:00
ydah
509b577e01 Implement BLOCK_PASS NODE keyword locations 2024-09-25 09:15:43 +09:00
ydah
31a88d1554 Implement RETURN NODE keyword locations 2024-09-25 09:06:42 +09:00
ydah
b811a9a097 Implement CASE3 NODE keyword locations 2024-09-23 09:19:37 +09:00
ydah
5334766beb Implement CASE2 NODE keyword locations 2024-09-23 09:19:37 +09:00
ydah
feac2b4b77 Implement CASE NODE keyword locations 2024-09-23 09:19:37 +09:00
S-H-GAMELINKS
95d26ee41e Reuse dedent_string function in rb_ruby_ripper_dedent_string function
This change is reduce Ruby C API dependency for Universal Parser.
Reuse dedent_string functions in rb_ruby_ripper_dedent_string functions and remove dependencies on rb_str_modify and rb_str_set_len from the parser.
2024-09-22 12:22:20 +09:00
Jeremy Evans
268c72377b
Raise a compile error for break/next/redo inside eval in cases where it is optimized away
In cases where break/next/redo are not valid syntax, they should
raise a SyntaxError even if inside a conditional block that is
optimized away.

Fixes [Bug #20597]

Co-authored-by: Kevin Newton <kddnewton@gmail.com>
2024-09-18 16:54:56 -07:00
Luke Gruber
5d358b660d Fix issue with super and forwarding arguments in prism_compile.c
Fixes [Bug #20720]
2024-09-11 16:41:46 -04:00
ydah
d03e0d1c35 Implement BREAK, NEXT and REDO NODE locations 2024-09-11 18:01:16 +09:00
ydah
4e6091ce09 Implement WHILE and UNTIL NODE locations 2024-09-11 09:28:55 +09:00
ydah
d52e599538 Implement WHEN NODE locations 2024-09-09 10:34:02 +09:00
ydah
32680f543c Implement AND/OR NODE operator locations 2024-09-05 13:03:28 +09:00
ydah
ab18b1b4f5 Implement VALIAS NODE keyword locations 2024-09-04 14:36:35 +09:00
ydah
a2243ee48b Implement ALIAS NODE keyword locations 2024-09-03 22:09:08 +09:00
ydah
af143d8a74 Implement UNDEF NODE keyword locations 2024-09-03 21:15:12 +09:00
yui-knk
c93d07ed74 [Bug #20695] Do not create needless string object in parser
`set_parser_s_value` does nothing in parser therefore no need to
create string object in parser `set_yylval_node`.

# Object allocation

Run `ruby benchmarks/lobsters/benchmark.rb` with the patch

```diff
diff --git a/benchmarks/lobsters/benchmark.rb b/benchmarks/lobsters/benchmark.rb
index 240c50c..6cdd0ac 100644
--- a/benchmarks/lobsters/benchmark.rb
+++ b/benchmarks/lobsters/benchmark.rb
@@ -7,6 +7,8 @@ Dir.chdir __dir__
 use_gemfile

 require_relative 'config/environment'
+printf "allocated_after_load=%d\n", GC.stat(:total_allocated_objects)
+exit
 require_relative "route_generator"

 # For an in-mem DB, we need to load all data on every boot
```

## Before

```
ruby 3.4.0dev (2024-08-31T18:30:25Z master d6fc8f3d57) [arm64-darwin21]
...
allocated_after_load=2143519
```

## After

```
ruby 3.4.0dev (2024-09-01T00:40:04Z fix_bugs_20695 d1bae52f75) [arm64-darwin21]
...
allocated_after_load=1579662
```

## Ruby 3.3.0 for reference

```
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin21]
...
allocated_after_load=1732702
```
2024-09-03 08:40:07 +09:00
Nobuyoshi Nakada
620ce3807b
[Bug #20680] ensure block is always void context 2024-08-25 08:16:54 +09:00
Nobuyoshi Nakada
995b4c329b
Make same structures same 2024-08-20 12:26:02 +09:00
Peter Zhu
584559d86a Fix leak of token_info when Ripper#warn jumps
For example, the following code leaks:

    class MyRipper < Ripper
      def initialize(src, &blk)
        super(src)
        @blk = blk
      end

      def warn(msg, *args) = @blk.call(msg)
    end

    $VERBOSE = true
    def call_parse = MyRipper.new("if true\n  end\n") { |msg| return msg }.parse

    10.times do
      500_000.times do
        call_parse
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    37536
    53744
    70064
    86448
    102576
    119120
    135248
    151216
    167744
    183824

After:

    19280
    19696
    19728
    20336
    20448
    21408
    21616
    21616
    21824
    21840
2024-08-07 09:14:14 -04:00
Peter Zhu
ced35800d4 Fix leak in warning of duplicate keys when Ripper#warn jumps
For example, the following code leaks:

    class MyRipper < Ripper
      def initialize(src, &blk)
        super(src)
        @blk = blk
      end

      def warn(msg, *args) = @blk.call(msg)
    end

    $VERBOSE = true
    def call_parse = MyRipper.new("if true\n  end\n") { |msg| return msg }.parse

    10.times do
      500_000.times do
        call_parse
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    34832
    51952
    69760
    88048
    105344
    123040
    141152
    159152
    176656
    194272

After:

    18400
    20256
    20272
    20272
    20272
    20304
    20368
    20368
    20368
    20400
2024-08-06 10:19:50 -04:00
yui-knk
66cbafc603 Refactor to use tokenize_ident instead of TOK_INTERN and set_yylval_name 2024-08-02 11:37:10 +09:00
Peter Zhu
6358397490 Fix leak of AST when Ripper#compile_error jumps
For example, the following script leaks:

    class MyRipper < Ripper
      def initialize(src, &blk)
        super(src)
        @blk = blk
      end

      def compile_error(msg) = @blk.call(msg)
    end

    def call_parse = MyRipper.new("/") { |msg| return msg }.parse

    10.times do
      100_000.times do
        call_parse
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    93952
    169040
    244224
    318784
    394432
    468224
    544048
    618560
    693776
    768384

After:

    19776
    19776
    20352
    20880
    20912
    21408
    21328
    21152
    21472
    20944
2024-07-31 14:47:44 -04:00
yui-knk
f2728c3393 Change RESBODY Node structure
Extracrt exception variable into `nd_exc_var` field
to keep the original grammar structure.

For example:

```
begin
rescue Error => e1
end
```

Before:

```
@ NODE_RESBODY (id: 8, line: 2, location: (2,0)-(2,18))
+- nd_args:
|   @ NODE_LIST (id: 2, line: 2, location: (2,7)-(2,12))
|   +- as.nd_alen: 1
|   +- nd_head:
|   |   @ NODE_CONST (id: 1, line: 2, location: (2,7)-(2,12))
|   |   +- nd_vid: :Error
|   +- nd_next:
|       (null node)
+- nd_body:
|   @ NODE_BLOCK (id: 6, line: 2, location: (2,13)-(2,18))
|   +- nd_head (1):
|   |   @ NODE_LASGN (id: 3, line: 2, location: (2,13)-(2,18))
|   |   +- nd_vid: :e1
|   |   +- nd_value:
|   |       @ NODE_ERRINFO (id: 5, line: 2, location: (2,13)-(2,18))
|   +- nd_head (2):
|       @ NODE_BEGIN (id: 4, line: 2, location: (2,18)-(2,18))
|       +- nd_body:
|           (null node)
+- nd_next:
    (null node)
```

After:

```
@ NODE_RESBODY (id: 6, line: 2, location: (2,0)-(2,18))
+- nd_args:
|   @ NODE_LIST (id: 2, line: 2, location: (2,7)-(2,12))
|   +- as.nd_alen: 1
|   +- nd_head:
|   |   @ NODE_CONST (id: 1, line: 2, location: (2,7)-(2,12))
|   |   +- nd_vid: :Error
|   +- nd_next:
|       (null node)
+- nd_exc_var:
|   @ NODE_LASGN (id: 3, line: 2, location: (2,13)-(2,18))
|   +- nd_vid: :e1
|   +- nd_value:
|       @ NODE_ERRINFO (id: 5, line: 2, location: (2,13)-(2,18))
+- nd_body:
|   @ NODE_BEGIN (id: 4, line: 2, location: (2,18)-(2,18))
|   +- nd_body:
|       (null node)
+- nd_next:
    (null node)
```
2024-07-26 07:29:32 +09:00
Nobuyoshi Nakada
e642ddf7ae
[Bug #20647] Disallow return directly within a singleton class 2024-07-24 14:44:32 +09:00
Peter Zhu
f0d8a0a2bf Fix memory leak in parser when loading non-ASCII file
When loading a non-ASCII compatible file, an error is raised which
causes memory leak.

For example:

    require "tempfile"

    Tempfile.create do |f|
      f.write("# -*- coding: UTF-16BE -*-")
      f.flush

      10.times do
        20_000.times do
          begin
            load(f.path)
          rescue
          end
        end

        puts `ps -o rss= -p #{$$}`
      end
    end

Before:

    33904
    49072
    64528
    79216
    94576
    109504
    124768
    139536
    154928
    170256

After:

    19568
    21296
    21664
    21728
    22192
    22256
    22416
    22272
    22272
    22272
2024-07-23 08:50:53 -04:00
yui-knk
57b11be15a Implement UNLESS NODE keyword locations 2024-07-23 14:35:23 +09:00
Nobuyoshi Nakada
3c4dc3e7ac
Remove unneeded local variable
`$5`, `brace_block` is no longer assigned in this action.
2024-07-21 12:10:33 +09:00
yui-knk
11e5ebaba7 Fix SEGV on method call with empty args and brace block for do block command call 2024-07-21 11:02:38 +09:00
yui-knk
84680dc255 Include undef keyword into UNDEF NODE location
For example:

```
undef a, b
```

Before:

```
@ NODE_UNDEF (id: 1, line: 1, location: (1,6)-(1,10))*
```

After:

```
@ NODE_UNDEF (id: 1, line: 1, location: (1,0)-(1,10))*
```
2024-07-20 13:04:48 +09:00
yui-knk
6be539aab5 Change UNDEF Node structure
Change UNDEF Node to hold their items to keep the original grammar
structure.

For example:

```
undef a, b
```

Before:

```
@ NODE_BLOCK (id: 4, line: 1, location: (1,6)-(1,10))*
+- nd_head (1):
|   @ NODE_UNDEF (id: 1, line: 1, location: (1,6)-(1,7))
|   +- nd_undef:
|       @ NODE_SYM (id: 0, line: 1, location: (1,6)-(1,7))
|       +- string: :a
+- nd_head (2):
    @ NODE_UNDEF (id: 3, line: 1, location: (1,9)-(1,10))
    +- nd_undef:
        @ NODE_SYM (id: 2, line: 1, location: (1,9)-(1,10))
        +- string: :b
```

After:

```
@ NODE_UNDEF (id: 1, line: 1, location: (1,6)-(1,10))*
+- nd_undefs:
    +- length: 2
    +- element (0):
    |   @ NODE_SYM (id: 0, line: 1, location: (1,6)-(1,7))
    |   +- string: :a
    +- element (1):
        @ NODE_SYM (id: 2, line: 1, location: (1,9)-(1,10))
        +- string: :b
```
2024-07-20 11:25:26 +09:00
yui-knk
231a9acc15 Free data of struct rb_parser_ary in rb_parser_ary_free
For example:

    10.times do
      100_000.times do
        RubyVM::AbstractSyntaxTree.parse("x = 1 + 2 +", keep_tokens: true)
      rescue SyntaxError
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    28944
    44816
    60720
    76496
    92336
   108160
   123968
   139808
   155648
   171408

After:

    11984
    12704
    12816
    12832
    13072
    13088
    13088
    13136
    13136
    13152
2024-07-18 19:19:27 +09:00
yui-knk
4fb7e1b6d0 Change enum rb_parser_ary_data_type default value to 1 for easy debug
We face `[BUG] unexpected rb_parser_ary_data_type (0) for script lines`
on master branch recently.
This commit changes `enum rb_parser_ary_data_type` to start with `1`
and `0` to be invalid then it makes clear `rb_parser_ary_data_type (0)`
is not intentional.
2024-06-26 07:48:43 +09:00
Nobuyoshi Nakada
250fc1223c [Bug #20457] Do not remove final return node
This was an optimization for versions prior to 1.9 that traverse the
AST at runtime.
2024-06-25 11:07:58 +09:00
Nobuyoshi Nakada
22f98bb7ca Parenthesize nd_fl_newline macro expressions 2024-06-25 11:07:58 +09:00
Aaron Patterson
cdf33ed5f3 Optimized forwarding callers and callees
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.

Calls it optimizes look like this:

```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```

```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```

```ruby
def bar(*a) = a

def foo(...)
  list = [1, 2]
  bar(*list, ...) # optimized
end
foo(123)
```

All variants of the above but using `super` are also optimized, including a bare super like this:

```ruby
def foo(...)
  super
end
```

This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:

```ruby
def m
  x = GC.stat(:total_allocated_objects)
  yield
  GC.stat(:total_allocated_objects) - x
end

def bar(a) = a
def foo(...) = bar(...)

def test
  m { foo(123) }
end

test
p test # allocates 1 object on master, but 0 objects with this patch
```

```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)

def test
  m { foo(1, b: 2) }
end

test
p test # allocates 2 objects on master, but 0 objects with this patch
```

How does it work?
-----------------

This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.

I think this description is kind of confusing, so let's walk through an example with code.

```ruby
def delegatee(a, b) = a + b

def delegator(...)
  delegatee(...)  # CI2 (FORWARDING)
end

def caller
  delegator(1, 2) # CI1 (argc: 2)
end
```

Before we call the delegator method, the stack looks like this:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   |
              5|   delegatee(...)  # CI2 (FORWARDING)  |
              6| end                                   |
              7|                                       |
              8| def caller                            |
          ->  9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method.  The `delegator` method has a special local called `...`
that holds the caller's CI object.

Here is the ISeq disasm fo `delegator`:

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

The local called `...` will contain the caller's CI: CI1.

Here is the stack when we enter `delegator`:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
           -> 4|   #                                   | CI1 (argc: 2)
              5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            |
              9|   delegator(1, 2) # CI1 (argc: 2)     |
             10| end                                   |
```

The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`.  In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`.  It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).

Before executing the `send` instruction, we push `...` on the stack.  The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):

```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself                                                          (   1)[LiCa]
0001 getlocal_WC_0                          "..."@0
0003 send                                   <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave                                  [Re]
```

Instruction 001 puts the caller's CI on the stack.  `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:

```
Executing Line | Code                                  | Stack
---------------+---------------------------------------+--------
              1| def delegatee(a, b) = a + b           | self
              2|                                       | 1
              3| def delegator(...)                    | 2
              4|   #                                   | CI1 (argc: 2)
           -> 5|   delegatee(...)  # CI2 (FORWARDING)  | cref_or_me
              6| end                                   | specval
              7|                                       | type
              8| def caller                            | self
              9|   delegator(1, 2) # CI1 (argc: 2)     | 1
             10| end                                   | 2
```

The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.

Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.

I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.

I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.

For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:

```ruby
SomeObject.new(foo: 1)
```

If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
2024-06-18 09:28:25 -07:00
Nobuyoshi Nakada
a1f72a563b [Bug #20579] ripper: Dispatch spaces at END-OF-INPUT without newline 2024-06-14 17:54:02 +09:00
Nobuyoshi Nakada
7f47469105 Include __LINE__ in add_delayed_token macro 2024-06-14 17:54:02 +09:00