Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods

Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.

This commit adds these methods.

* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.

[Feature #19070]

Impacts on memory usage and performance are below:

Memory usage:

```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)

$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb

# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb

# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```

Performance:

```
$ cat ../ast_keep_tokens.yml
prelude: |
  src = <<~SRC
    module M
      class C
        def m1(a, b)
          1 + a + b
        end
      end
    end
  SRC
benchmark:
  without_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
  with_keep_tokens: |
    RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)

$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
            --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
            --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
            --output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..

|                     |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens  |     21.659k|   21.303k|
|                     |       1.02x|         -|
|with_keep_tokens     |      6.220k|    5.691k|
|                     |       1.09x|         -|
```
This commit is contained in:
yui-knk 2022-09-23 22:40:02 +09:00 committed by Yuichiro Kaneko
parent bbc4cf5f76
commit d8601621ed
Notes: git 2022-11-21 00:02:01 +00:00
9 changed files with 556 additions and 104 deletions

52
ast.rb
View file

@ -29,8 +29,8 @@ module RubyVM::AbstractSyntaxTree
#
# RubyVM::AbstractSyntaxTree.parse("x = 1 + 2")
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-1:9>
def self.parse string, keep_script_lines: false, error_tolerant: false
Primitive.ast_s_parse string, keep_script_lines, error_tolerant
def self.parse string, keep_script_lines: false, error_tolerant: false, keep_tokens: false
Primitive.ast_s_parse string, keep_script_lines, error_tolerant, keep_tokens
end
# call-seq:
@ -44,8 +44,8 @@ module RubyVM::AbstractSyntaxTree
#
# RubyVM::AbstractSyntaxTree.parse_file("my-app/app.rb")
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-31:3>
def self.parse_file pathname, keep_script_lines: false, error_tolerant: false
Primitive.ast_s_parse_file pathname, keep_script_lines, error_tolerant
def self.parse_file pathname, keep_script_lines: false, error_tolerant: false, keep_tokens: false
Primitive.ast_s_parse_file pathname, keep_script_lines, error_tolerant, keep_tokens
end
# call-seq:
@ -63,8 +63,8 @@ module RubyVM::AbstractSyntaxTree
#
# RubyVM::AbstractSyntaxTree.of(method(:hello))
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-3:3>
def self.of body, keep_script_lines: false, error_tolerant: false
Primitive.ast_s_of body, keep_script_lines, error_tolerant
def self.of body, keep_script_lines: false, error_tolerant: false, keep_tokens: false
Primitive.ast_s_of body, keep_script_lines, error_tolerant, keep_tokens
end
# call-seq:
@ -136,6 +136,46 @@ module RubyVM::AbstractSyntaxTree
Primitive.ast_node_last_column
end
# call-seq:
# node.tokens -> array
#
# Returns tokens corresponding to the location of the node.
# Returns nil if keep_tokens is not enabled when parse method is called.
# Token is an array of:
#
# - id
# - token type
# - source code text
# - location [first_lineno, first_column, last_lineno, last_column]
#
# root = RubyVM::AbstractSyntaxTree.parse("x = 1 + 2", keep_tokens: true)
# root.tokens # => [[0, :tIDENTIFIER, "x", [1, 0, 1, 1]], [1, :tSP, " ", [1, 1, 1, 2]], ...]
# root.tokens.map{_1[2]}.join # => "x = 1 + 2"
def tokens
return nil unless all_tokens
all_tokens.each_with_object([]) do |token, a|
loc = token.last
if ([first_lineno, first_column] <=> [loc[0], loc[1]]) <= 0 &&
([last_lineno, last_column] <=> [loc[2], loc[3]]) >= 0
a << token
end
end
end
# call-seq:
# node.all_tokens -> array
#
# Returns all tokens for the input script regardless the receiver node.
# Returns nil if keep_tokens is not enabled when parse method is called.
#
# root = RubyVM::AbstractSyntaxTree.parse("x = 1 + 2", keep_tokens: true)
# root.all_tokens # => [[0, :tIDENTIFIER, "x", [1, 0, 1, 1]], [1, :tSP, " ", [1, 1, 1, 2]], ...]
# root.children[-1].all_tokens # => [[0, :tIDENTIFIER, "x", [1, 0, 1, 1]], [1, :tSP, " ", [1, 1, 1, 2]], ...]
def all_tokens
Primitive.ast_node_all_tokens
end
# call-seq:
# node.children -> array
#