mirror of
https://github.com/ruby/ruby.git
synced 2025-08-25 05:55:46 +02:00

Implementation for Language Server Protocol (LSP) sometimes needs token information. For example both `m(1)` and `m(1, )` has same AST structure other than node locations then it's impossible to check the existence of `,` from AST. However in later case, it might be better to suggest variables list for the second argument. Token information is important for such case. This commit adds these methods. * Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of` * Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes. * Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node. [Feature #19070] Impacts on memory usage and performance are below: Memory usage: ``` $ cat test.rb root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true) $ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] 11408kb # keep_tokens :false $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb 17508kb # keep_tokens :true $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb 30960kb ``` Performance: ``` $ cat ../ast_keep_tokens.yml prelude: | src = <<~SRC module M class C def m1(a, b) 1 + a + b end end end SRC benchmark: without_keep_tokens: | RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false) with_keep_tokens: | RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true) $ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml /home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ --output=markdown --output-compare -v ../ast_keep_tokens.yml compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] warming up.. | |compare-ruby|built-ruby| |:--------------------|-----------:|---------:| |without_keep_tokens | 21.659k| 21.303k| | | 1.02x| -| |with_keep_tokens | 6.220k| 5.691k| | | 1.09x| -| ```
245 lines
7.9 KiB
Ruby
245 lines
7.9 KiB
Ruby
# for ast.c
|
|
|
|
# AbstractSyntaxTree provides methods to parse Ruby code into
|
|
# abstract syntax trees. The nodes in the tree
|
|
# are instances of RubyVM::AbstractSyntaxTree::Node.
|
|
#
|
|
# This module is MRI specific as it exposes implementation details
|
|
# of the MRI abstract syntax tree.
|
|
#
|
|
# This module is experimental and its API is not stable, therefore it might
|
|
# change without notice. As examples, the order of children nodes is not
|
|
# guaranteed, the number of children nodes might change, there is no way to
|
|
# access children nodes by name, etc.
|
|
#
|
|
# If you are looking for a stable API or an API working under multiple Ruby
|
|
# implementations, consider using the _parser_ gem or Ripper. If you would
|
|
# like to make RubyVM::AbstractSyntaxTree stable, please join the discussion
|
|
# at https://bugs.ruby-lang.org/issues/14844.
|
|
#
|
|
module RubyVM::AbstractSyntaxTree
|
|
|
|
# call-seq:
|
|
# RubyVM::AbstractSyntaxTree.parse(string) -> RubyVM::AbstractSyntaxTree::Node
|
|
#
|
|
# Parses the given _string_ into an abstract syntax tree,
|
|
# returning the root node of that tree.
|
|
#
|
|
# SyntaxError is raised if the given _string_ is invalid syntax.
|
|
#
|
|
# RubyVM::AbstractSyntaxTree.parse("x = 1 + 2")
|
|
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-1:9>
|
|
def self.parse string, keep_script_lines: false, error_tolerant: false, keep_tokens: false
|
|
Primitive.ast_s_parse string, keep_script_lines, error_tolerant, keep_tokens
|
|
end
|
|
|
|
# call-seq:
|
|
# RubyVM::AbstractSyntaxTree.parse_file(pathname) -> RubyVM::AbstractSyntaxTree::Node
|
|
#
|
|
# Reads the file from _pathname_, then parses it like ::parse,
|
|
# returning the root node of the abstract syntax tree.
|
|
#
|
|
# SyntaxError is raised if _pathname_'s contents are not
|
|
# valid Ruby syntax.
|
|
#
|
|
# RubyVM::AbstractSyntaxTree.parse_file("my-app/app.rb")
|
|
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-31:3>
|
|
def self.parse_file pathname, keep_script_lines: false, error_tolerant: false, keep_tokens: false
|
|
Primitive.ast_s_parse_file pathname, keep_script_lines, error_tolerant, keep_tokens
|
|
end
|
|
|
|
# call-seq:
|
|
# RubyVM::AbstractSyntaxTree.of(proc) -> RubyVM::AbstractSyntaxTree::Node
|
|
# RubyVM::AbstractSyntaxTree.of(method) -> RubyVM::AbstractSyntaxTree::Node
|
|
#
|
|
# Returns AST nodes of the given _proc_ or _method_.
|
|
#
|
|
# RubyVM::AbstractSyntaxTree.of(proc {1 + 2})
|
|
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:35-1:42>
|
|
#
|
|
# def hello
|
|
# puts "hello, world"
|
|
# end
|
|
#
|
|
# RubyVM::AbstractSyntaxTree.of(method(:hello))
|
|
# # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-3:3>
|
|
def self.of body, keep_script_lines: false, error_tolerant: false, keep_tokens: false
|
|
Primitive.ast_s_of body, keep_script_lines, error_tolerant, keep_tokens
|
|
end
|
|
|
|
# call-seq:
|
|
# RubyVM::AbstractSyntaxTree.node_id_for_backtrace_location(backtrace_location) -> integer
|
|
#
|
|
# Returns the node id for the given backtrace location.
|
|
#
|
|
# begin
|
|
# raise
|
|
# rescue => e
|
|
# loc = e.backtrace_locations.first
|
|
# RubyVM::AbstractSyntaxTree.node_id_for_backtrace_location(loc)
|
|
# end # => 0
|
|
def self.node_id_for_backtrace_location backtrace_location
|
|
Primitive.node_id_for_backtrace_location backtrace_location
|
|
end
|
|
|
|
# RubyVM::AbstractSyntaxTree::Node instances are created by parse methods in
|
|
# RubyVM::AbstractSyntaxTree.
|
|
#
|
|
# This class is MRI specific.
|
|
#
|
|
class Node
|
|
|
|
# call-seq:
|
|
# node.type -> symbol
|
|
#
|
|
# Returns the type of this node as a symbol.
|
|
#
|
|
# root = RubyVM::AbstractSyntaxTree.parse("x = 1 + 2")
|
|
# root.type # => :SCOPE
|
|
# lasgn = root.children[2]
|
|
# lasgn.type # => :LASGN
|
|
# call = lasgn.children[1]
|
|
# call.type # => :OPCALL
|
|
def type
|
|
Primitive.ast_node_type
|
|
end
|
|
|
|
# call-seq:
|
|
# node.first_lineno -> integer
|
|
#
|
|
# The line number in the source code where this AST's text began.
|
|
def first_lineno
|
|
Primitive.ast_node_first_lineno
|
|
end
|
|
|
|
# call-seq:
|
|
# node.first_column -> integer
|
|
#
|
|
# The column number in the source code where this AST's text began.
|
|
def first_column
|
|
Primitive.ast_node_first_column
|
|
end
|
|
|
|
# call-seq:
|
|
# node.last_lineno -> integer
|
|
#
|
|
# The line number in the source code where this AST's text ended.
|
|
def last_lineno
|
|
Primitive.ast_node_last_lineno
|
|
end
|
|
|
|
# call-seq:
|
|
# node.last_column -> integer
|
|
#
|
|
# The column number in the source code where this AST's text ended.
|
|
def last_column
|
|
Primitive.ast_node_last_column
|
|
end
|
|
|
|
# call-seq:
|
|
# node.tokens -> array
|
|
#
|
|
# Returns tokens corresponding to the location of the node.
|
|
# Returns nil if keep_tokens is not enabled when parse method is called.
|
|
# Token is an array of:
|
|
#
|
|
# - id
|
|
# - token type
|
|
# - source code text
|
|
# - location [first_lineno, first_column, last_lineno, last_column]
|
|
#
|
|
# root = RubyVM::AbstractSyntaxTree.parse("x = 1 + 2", keep_tokens: true)
|
|
# root.tokens # => [[0, :tIDENTIFIER, "x", [1, 0, 1, 1]], [1, :tSP, " ", [1, 1, 1, 2]], ...]
|
|
# root.tokens.map{_1[2]}.join # => "x = 1 + 2"
|
|
def tokens
|
|
return nil unless all_tokens
|
|
|
|
all_tokens.each_with_object([]) do |token, a|
|
|
loc = token.last
|
|
if ([first_lineno, first_column] <=> [loc[0], loc[1]]) <= 0 &&
|
|
([last_lineno, last_column] <=> [loc[2], loc[3]]) >= 0
|
|
a << token
|
|
end
|
|
end
|
|
end
|
|
|
|
# call-seq:
|
|
# node.all_tokens -> array
|
|
#
|
|
# Returns all tokens for the input script regardless the receiver node.
|
|
# Returns nil if keep_tokens is not enabled when parse method is called.
|
|
#
|
|
# root = RubyVM::AbstractSyntaxTree.parse("x = 1 + 2", keep_tokens: true)
|
|
# root.all_tokens # => [[0, :tIDENTIFIER, "x", [1, 0, 1, 1]], [1, :tSP, " ", [1, 1, 1, 2]], ...]
|
|
# root.children[-1].all_tokens # => [[0, :tIDENTIFIER, "x", [1, 0, 1, 1]], [1, :tSP, " ", [1, 1, 1, 2]], ...]
|
|
def all_tokens
|
|
Primitive.ast_node_all_tokens
|
|
end
|
|
|
|
# call-seq:
|
|
# node.children -> array
|
|
#
|
|
# Returns AST nodes under this one. Each kind of node
|
|
# has different children, depending on what kind of node it is.
|
|
#
|
|
# The returned array may contain other nodes or <code>nil</code>.
|
|
def children
|
|
Primitive.ast_node_children
|
|
end
|
|
|
|
# call-seq:
|
|
# node.inspect -> string
|
|
#
|
|
# Returns debugging information about this node as a string.
|
|
def inspect
|
|
Primitive.ast_node_inspect
|
|
end
|
|
|
|
# call-seq:
|
|
# node.node_id -> integer
|
|
#
|
|
# Returns an internal node_id number.
|
|
# Note that this is an API for ruby internal use, debugging,
|
|
# and research. Do not use this for any other purpose.
|
|
# The compatibility is not guaranteed.
|
|
def node_id
|
|
Primitive.ast_node_node_id
|
|
end
|
|
|
|
# call-seq:
|
|
# node.script_lines -> array
|
|
#
|
|
# Returns the original source code as an array of lines.
|
|
#
|
|
# Note that this is an API for ruby internal use, debugging,
|
|
# and research. Do not use this for any other purpose.
|
|
# The compatibility is not guaranteed.
|
|
def script_lines
|
|
Primitive.ast_node_script_lines
|
|
end
|
|
|
|
# call-seq:
|
|
# node.source -> string
|
|
#
|
|
# Returns the code fragment that corresponds to this AST.
|
|
#
|
|
# Note that this is an API for ruby internal use, debugging,
|
|
# and research. Do not use this for any other purpose.
|
|
# The compatibility is not guaranteed.
|
|
#
|
|
# Also note that this API may return an incomplete code fragment
|
|
# that does not parse; for example, a here document following
|
|
# an expression may be dropped.
|
|
def source
|
|
lines = script_lines
|
|
if lines
|
|
lines = lines[first_lineno - 1 .. last_lineno - 1]
|
|
lines[-1] = lines[-1][0...last_column]
|
|
lines[0] = lines[0][first_column..-1]
|
|
lines.join
|
|
else
|
|
nil
|
|
end
|
|
end
|
|
end
|
|
end
|