Commit graph

196 commits

Author SHA1 Message Date
Peter Zhu
330830dd1a Add IMEMO_NEW
Rather than exposing that an imemo has a flag and four fields, this
changes the implementation to only expose one field (the klass) and
fills the rest with 0. The type will have to fill in the values themselves.
2024-02-21 11:33:05 -05:00
yui-knk
e7ab5d891c Introduce NODE_REGX to manage regexp literal 2024-02-21 08:06:48 +09:00
Peter Zhu
c184aa8740 Use rb_gc_mark_and_move for imemo 2024-02-20 10:39:30 -05:00
yui-knk
89cfc15207 [Feature #20257] Rearchitect Ripper
Introduce another semantic value stack for Ripper so that
Ripper can manage both Node and Ruby Object separately.
This rearchitectutre of Ripper solves these issues.
Therefore adding test cases for them.

* [Bug 10436] https://bugs.ruby-lang.org/issues/10436
* [Bug 18988] https://bugs.ruby-lang.org/issues/18988
* [Bug 20055] https://bugs.ruby-lang.org/issues/20055

Checked the differences of `Ripper.sexp` for files under `/test/ruby`
are only on test_pattern_matching.rb.
The differences comes from the differences between
`new_hash_pattern_tail` functions between parser and Ripper.
Ripper `new_hash_pattern_tail` didn’t call `assignable` then
`kw_rest_arg` wasn’t marked as local variable.
This is also fixed by this commit.

```
--- a/./tmp/before/test_pattern_matching.rb
+++ b/./tmp/after/test_pattern_matching.rb
@@ -3607,7 +3607,7 @@
                  [:in,
                   [:hshptn, nil, [], [:var_field, [:@ident, “a”, [984, 13]]]],
                   [[:binary,
-                    [:vcall, [:@ident, “a”, [985, 10]]],
+                    [:var_ref, [:@ident, “a”, [985, 10]]],
                     :==,
                     [:hash, nil]]],
                   nil]]],
@@ -3662,7 +3662,7 @@
                  [:in,
                   [:hshptn, nil, [], [:var_field, [:@ident, “a”, [993, 13]]]],
                   [[:binary,
-                    [:vcall, [:@ident, “a”, [994, 10]]],
+                    [:var_ref, [:@ident, “a”, [994, 10]]],
                     :==,
                     [:hash,
                      [:assoclist_from_args,
@@ -3813,7 +3813,7 @@
                    [:command,
                     [:@ident, “raise”, [1022, 10]],
                     [:args_add_block,
-                     [[:vcall, [:@ident, “b”, [1022, 16]]]],
+                     [[:var_ref, [:@ident, “b”, [1022, 16]]]],
                      false]]],
                   [:else, [[:var_ref, [:@kw, “true”, [1024, 10]]]]]]]],
                nil,
@@ -3876,7 +3876,7 @@
                      [:@int, “0”, [1033, 15]]],
                     :“&&“,
                     [:binary,
-                     [:vcall, [:@ident, “b”, [1033, 20]]],
+                     [:var_ref, [:@ident, “b”, [1033, 20]]],
                      :==,
                      [:hash, nil]]]],
                   nil]]],
@@ -3946,7 +3946,7 @@
                      [:@int, “0”, [1042, 15]]],
                     :“&&“,
                     [:binary,
-                     [:vcall, [:@ident, “b”, [1042, 20]]],
+                     [:var_ref, [:@ident, “b”, [1042, 20]]],
                      :==,
                      [:hash,
                       [:assoclist_from_args,
@@ -5206,7 +5206,7 @@
                      [[:assoc_new,
                        [:@label, “c:“, [1352, 22]],
                        [:@int, “0”, [1352, 25]]]]]],
-                   [:vcall, [:@ident, “r”, [1352, 29]]]],
+                   [:var_ref, [:@ident, “r”, [1352, 29]]]],
                   false]]],
                [:binary,
                 [:call,
@@ -5299,7 +5299,7 @@
                       [:assoc_new,
                        [:@label, “c:“, [1367, 34]],
                        [:@int, “0”, [1367, 37]]]]]],
-                   [:vcall, [:@ident, “r”, [1367, 41]]]],
+                   [:var_ref, [:@ident, “r”, [1367, 41]]]],
                   false]]],
                [:binary,
                 [:call,
@@ -5931,7 +5931,7 @@
              [:in,
               [:hshptn, nil, [], [:var_field, [:@ident, “r”, [1533, 11]]]],
               [[:binary,
-                [:vcall, [:@ident, “r”, [1534, 8]]],
+                [:var_ref, [:@ident, “r”, [1534, 8]]],
                 :==,
                 [:hash,
                  [:assoclist_from_args,
```
2024-02-20 17:33:58 +09:00
Peter Zhu
c5f22b5b75 Make all fields in AST movable 2024-02-16 10:15:35 -05:00
yui-knk
33c1e082d0 Remove ruby object from string nodes
String nodes holds ruby string object on `VALUE nd_lit`.
This commit changes it to `struct rb_parser_string *string`
to reduce dependency on ruby object.
Sometimes these strings are concatenated with other string
therefore string concatenate functions are needed.
2024-02-09 14:20:17 +09:00
Nobuyoshi Nakada
0610f555ea
Constify rb_global_parser_config 2024-01-14 17:55:11 +09:00
yui-knk
b35e21b388 Remove reference counter from rb_parser_config
It's allocated outside of parser then no need to track
reference count in rb_parser_config.
2024-01-12 21:17:41 +09:00
yui-knk
52d9e55903 Statically allocate parser config 2024-01-12 21:17:41 +09:00
yui-knk
db476cc71c Introduce NODE_SYM to manage symbol literal
`:sym` was managed by `NODE_LIT` with `Symbol` object.
This commit introduces `NODE_SYM` so that

1. Symbol literal is detectable from AST Node
2. Reduce dependency on ruby object
2024-01-09 16:07:19 +09:00
S-H-GAMELINKS
1b8d01136c Introduce Numeric Node's 2024-01-07 09:24:34 +09:00
yui-knk
7a050638b1 Introduce NODE_FILE
`__FILE__` was managed by `NODE_STR` with `String` object.
This commit introduces `NODE_FILE` and `struct rb_parser_string` so that

1. `__FILE__` is detectable from AST Node
2. Reduce dependency ruby object
2024-01-02 14:19:42 +09:00
Nobuyoshi Nakada
13c9cbe09e
Embed rb_args_info in rb_node_args_t 2023-10-30 00:19:43 +09:00
Nobuyoshi Nakada
a405b28e85 Delete heredoc line mark references 2023-10-14 11:08:43 +09:00
yui-knk
d293d9e191 Expand pattern_info struct into ARYPTN Node and FNDPTN Node 2023-09-30 13:11:32 +09:00
Peter Zhu
97564ddf2b Fix memory leak in the parser
Reproduction script:

```
require "ripper"

10.times do
  20_000.times do
    Ripper.parse("")
  end

  puts `ps -o rss= -p #{$$}`
end
```

Before:

```
28032
34432
40704
47232
53632
60032
66432
72832
79232
85632
```

After:

```
21760
21760
21760
21760
21760
21760
21760
21760
21760
21760
```
2023-09-29 16:51:17 -04:00
yui-knk
74c6781153 Change RNode structure from union to struct
All kind of AST nodes use same struct RNode, which has u1, u2, u3 union members
for holding different kind of data.
This has two problems.

1. Low flexibility of data structure

Some nodes, for example NODE_TRUE, don’t use u1, u2, u3. On the other hand,
NODE_OP_ASGN2 needs more than three union members. However they use same
structure definition, need to allocate three union members for NODE_TRUE and
need to separate NODE_OP_ASGN2 into another node.
This change removes the restriction so make it possible to
change data structure by each node type.

2. No compile time check for union member access

It’s developer’s responsibility for using correct member for each node type when it’s union.
This change clarifies which node has which type of fields and enables compile time check.

This commit also changes node_buffer_elem_struct buf management to handle
different size data with alignment.
2023-09-28 11:58:10 +09:00
yui-knk
fb7a2ddb4b Directly free structure managed by imemo tmpbuf
NODE_ARGS, NODE_ARYPTN, NODE_FNDPTN manage memory of their
structure by imemo tmpbuf Object.
However rb_ast_struct has reference to NODE. Then these
memory can be freed directly when rb_ast_struct is freed.

This commit reduces parser's dependency on CRuby functions.
2023-09-22 11:25:53 +09:00
yui-knk
19c62b400d Replace parser & node compile_option from Hash to bit field
This commit reduces dependency to CRuby object.
2023-06-17 16:41:08 +09:00
yui-knk
b481b673d7 [Feature #19719] Universal Parser
Introduce Universal Parser mode for the parser.
This commit includes these changes:

* Introduce `UNIVERSAL_PARSER` macro. All of CRuby related functions
  are passed via `struct rb_parser_config_struct` when this macro is enabled.
* Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
2023-06-12 18:23:48 +09:00
yui-knk
5f65e8c5d5 Rename rb_node_name to the original name
98637d421d changes the name of
the function. However this function is exported as global,
then change the name to origin one for keeping compatibility.
2023-05-24 20:54:48 +09:00
yui-knk
98637d421d Move ruby_node_name to node.c and rename prefix of the function 2023-05-23 18:05:35 +09:00
yui-knk
d8601621ed Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.

This commit adds these methods.

* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.

[Feature #19070]

Impacts on memory usage and performance are below:

Memory usage:

```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)

$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb

# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb

# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```

Performance:

```
$ cat ../ast_keep_tokens.yml
prelude: |
  src = <<~SRC
    module M
      class C
        def m1(a, b)
          1 + a + b
        end
      end
    end
  SRC
benchmark:
  without_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
  with_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)

$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..

|                     |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens  |     21.659k|   21.303k|
|                     |       1.02x|         -|
|with_keep_tokens     |      6.220k|    5.691k|
|                     |       1.09x|         -|
```
2022-11-21 09:01:34 +09:00
yui-knk
4bfdf6d06d Move error from top_stmts and top_stmt to stmt
By this change, syntax error is recovered smaller units.
In the case below, "DEFN :bar" is same level with "CLASS :Foo"
now.

```
module Z
  class Foo
    foo.
  end

  def bar
  end
end
```

[Feature #19013]
2022-10-08 17:59:11 +09:00
Wolf
c69ad738dc Initialize node_id
In some causes node_id might have been left uninitialized leading to
undefined behavior on access. So always set it to -1, so we have *some*
valid value in there.
2022-08-01 10:36:36 +09:00
Takashi Kokubun
5b21e94beb Expand tabs [ci skip]
[Misc #18891]
2022-07-21 09:42:04 -07:00
Nobuyoshi Nakada
54f0e63a8c Remove NODE_DASGN_CURR [Feature #18406]
This `NODE` type was used in pre-YARV implementation, to improve
the performance of assignment to dynamic local variable defined at
the innermost scope.  It has no longer any actual difference with
`NODE_DASGN`, except for the node dump.
2021-12-13 12:53:03 +09:00
S.H
ec7f14d9fa
Add nd_type_p macro 2021-12-04 00:01:24 +09:00
Yusuke Endoh
feda058531 Refactor hacky ID tables to struct rb_ast_id_table_t
The implementation of a local variable tables was represented as `ID*`,
but it was very hacky: the first element is not an ID but the size of
the table, and, the last element is (sometimes) a link to the next local
table only when the id tables are a linked list.

This change converts the hacky implementation to a normal struct.
2021-11-21 08:59:24 +09:00
Yusuke Endoh
753cfbdbf3 node.c (dump_node): update format explanation for NODE_ARGS 2021-11-17 23:38:52 +09:00
Yusuke Endoh
5a7b4dba26 node.c (dump_node): trivial refactoring 2021-11-17 23:38:19 +09:00
Nobuyoshi Nakada
6504ca006b
Show node IDs in dump 2021-07-12 12:10:16 +09:00
Yusuke Endoh
acae5f363d ast.rb: RubyVM::AST.parse and .of accepts save_script_lines: true
This option makes the parser keep the original source as an array of
the original code lines. This feature exploits the mechanism of
`SCRIPT_LINES__` but records only the specified code that is passed to
RubyVM::AST.of or .parse, instead of recording all parsed program texts.
2021-06-18 02:34:27 +09:00
Yusuke Endoh
e48109d86f Partially revert 2c7d3b3a72
to make imemo_ast WB-protected again. Only the test is kept.
2021-04-27 17:05:19 +09:00
Yusuke Endoh
2c7d3b3a72 node.c (rb_ast_new): imemo_ast is WB-unprotected
Previously imemo_ast was handled as WB-protected which caused a segfault
of the following code:

    # shareable_constant_value: literal
    M0 = {}
    M1 = {}
    ...
    M100000 = {}

My analysis is here: `shareable_constant_value: literal` creates many
Hash instances during parsing, and add them to node_buffer of imemo_ast.
However, the contents are missed because imemo_ast is incorrectly
WB-protected.

This changeset makes imemo_ast as WB-unprotected.
2021-04-26 22:46:51 +09:00
Nobuyoshi Nakada
c060bdc2b4
NODE markability should not change by nd_set_type 2021-01-14 16:12:02 +09:00
Kazuki Tsujimoto
e03e1982bd
Change NODE layout for pattern matching
I prefer pconst to be the first element of NODE.

  Before:

       | ARYPTN | FNDPTN | HSHPTN
    ---+--------+--------+-----------
    u1 | imemo  | imemo  | pkwargs
    u2 | pconst | pconst | pconst
    u3 | apinfo | fpinfo | pkwrestarg

  After:

       | ARYPTN | FNDPTN | HSHPTN
    ---+--------+--------+-----------
    u1 | pconst | pconst | pconst
    u2 | imemo  | imemo  | pkwargs
    u3 | apinfo | fpinfo | pkwrestarg
2020-11-01 16:19:07 +09:00
Nobuyoshi Nakada
081cc4eb28
Dump FrozenCore specially 2020-10-20 23:52:19 +09:00
Nobuyoshi Nakada
7b2bea42a2
Unfreeze string-literal-only interpolated string-literal
[Feature #17104]
2020-09-30 22:15:28 +09:00
Kazuki Tsujimoto
fcdbdff631
rb_{ary,fnd}_pattern_info: Remove imemo member to reduce memory usage
This is a partial revert commit of 8f096226e1.

NODE layout:

  Before:

       | ARYPTN | FNDPTN | HSHPTN
    ---+--------+--------+-----------
    u1 | pconst | pconst | pconst
    u2 | unused | unused | pkwargs
    u3 | apinfo | fpinfo | pkwrestarg

  After:

       | ARYPTN | FNDPTN | HSHPTN
    ---+--------+--------+-----------
    u1 | imemo  | imemo  | pkwargs
    u2 | pconst | pconst | pconst
    u3 | apinfo | fpinfo | pkwrestarg
2020-08-02 01:04:06 +09:00
Aaron Patterson
7533519990
NODE_MATCH needs reference updating 2020-07-30 11:11:13 -07:00
Aaron Patterson
35ba2783fe Use a linked list to eliminate imemo tmp bufs for managing local tables
This patch changes local table memory to be managed by a linked list
rather than via the garbage collector.  It reduces allocations from the
GC and also fixes a use-after-free bug in the concurrent-with-sweep
compactor I'm working on.
2020-07-27 12:40:01 -07:00
Koichi Sasada
a0f12a0258
Use ID instead of GENTRY for gvars. (#3278)
Use ID instead of GENTRY for gvars.

Global variables are compiled into GENTRY (a pointer to struct
rb_global_entry). This patch replace this GENTRY to ID and
make the code simple.

We need to search GENTRY from ID every time (st_lookup), so
additional overhead will be introduced.
However, the performance of accessing global variables is not
important now a day and this simplicity helps Ractor development.
2020-07-03 16:56:44 +09:00
Kazuki Tsujimoto
ddded1157a
Introduce find pattern [Feature #16828] 2020-06-14 09:24:36 +09:00
卜部昌平
5e22f873ed decouple internal.h headers
Saves comitters' daily life by avoid #include-ing everything from
internal.h to make each file do so instead.  This would significantly
speed up incremental builds.

We take the following inclusion order in this changeset:

1.  "ruby/config.h", where _GNU_SOURCE is defined (must be the very
    first thing among everything).
2.  RUBY_EXTCONF_H if any.
3.  Standard C headers, sorted alphabetically.
4.  Other system headers, maybe guarded by #ifdef
5.  Everything else, sorted alphabetically.

Exceptions are those win32-related headers, which tend not be self-
containing (headers have inclusion order dependencies).
2019-12-26 20:45:12 +09:00
Nobuyoshi Nakada
fb6a489af2
Revert "Method reference operator"
This reverts commit 67c5747369.
[Feature #16275]
2019-11-12 17:24:48 +09:00
Aaron Patterson
7460c884fb
Use an identity hash for pinning Ripper objects
Ripper reuses parse.y for its implementation.  Ripper changes the
grammar productions to sometimes return Ruby objects.  This Ruby objects
are put in to the parser's stack, so they must be kept alive.  This is
where the "mark_ary" comes in.  The mark array ensures that Ruby objects
created and pushed on the stack during the course of parsing will stay
alive for the life of the parsing functions.

Unfortunately, Arrays do not prevent their contents from moving.  If the
compactor runs, objects on the parser stack could move because the array
won't prevent them from moving.  But the GC doesn't know about the
parser stack, so it can't update references in that stack (it will
update them in the array).

This commit changes the mark array to be an identity hash.  Since the
identity hash relies on memory addresses for the definition of identity,
the GC will not allow keys in an identity hash to move.  We can prevent
movement of objects in the parser stack by sticking them in an identity
hash.
2019-11-05 08:24:14 -08:00
卜部昌平
7e0ae1698d avoid overflow in integer multiplication
This changeset basically replaces `ruby_xmalloc(x * y)` into
`ruby_xmalloc2(x, y)`.  Some convenient functions are also
provided for instance `rb_xmalloc_mul_add(x, y, z)` which allocates
x * y + z byes.
2019-10-09 12:12:28 +09:00
Nobuyoshi Nakada
0c6f36668a
Adjusted spaces [ci skip] 2019-09-27 10:20:56 +09:00
Aaron Patterson
293c6c8cc3
Add compaction support to rb_ast_t
This commit adds compaction support to `rb_ast_t`.
2019-09-26 15:41:46 -07:00