[Bug #19924] Source code should be unsigned char stream
Use `peekc` or `nextc` to fetch the next character, instead of reading
from `lex.pcur` directly, for compilers that plain char is signed.
---
parse.y | 10 +++++-----
test/ruby/test_parse.rb | 2 ++
2 files changed, 7 insertions(+), 5 deletions(-)
Fix memory leak for incomplete lambdas
[Bug #19836]
The parser does not free the chain of `struct vtable`, which causes
memory leaks.
The following script reproduces this issue:
```
10.times do
100_000.times do
Ripper.parse("-> {")
end
puts `ps -o rss= -p #{$$}`
end
```
---
parse.y | 24 ++++++++++++++----------
test/ripper/test_ripper.rb | 7 +++++++
2 files changed, 21 insertions(+), 10 deletions(-)
Fix memory leak in parser for incomplete tokens
[Bug #19835]
The parser does not free the `tbl` of the `struct vtable` when there are
leftover `lvtbl` in the parser. This causes a memory leak.
The following script reproduces this issue:
```
10.times do
100_000.times do
Ripper.parse("class Foo")
end
puts `ps -o rss= -p #{$$}`
end
```
---
parse.y | 42 ++++++++++++++++++++++++++++--------------
test/ripper/test_ripper.rb | 7 +++++++
2 files changed, 35 insertions(+), 14 deletions(-)
Handle unterminated unicode escapes in regexps
This fixes an infinite loop possible after ec3542229b.
For \u{} escapes in regexps, skip validation in the parser, and rely on the regexp
code to handle validation. This is necessary so that invalid unicode escapes in
comments in extended regexps are allowed.
Fixes [Bug #19750]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
---
parse.y | 97 ++++++++++++++++++++++++++++++++-----------------
test/ruby/test_parse.rb | 16 ++++++++
2 files changed, 79 insertions(+), 34 deletions(-)
This was introduced by b609bdeb53
to suppress warnings. However these warngins were deleted by
beae6cbf0f. Therefore these codes
are not needed anymore.
If the rescue clause has only exc_var and not exc_list, use the
exc_var position instead of the rescue body position.
This issue appears to have been introduced in
688169fd83 when "opt_list" was split
into "exc_list exc_var".
Fixes [Bug #18974]
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.
This commit adds these methods.
* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.
[Feature #19070]
Impacts on memory usage and performance are below:
Memory usage:
```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb
# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb
# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```
Performance:
```
$ cat ../ast_keep_tokens.yml
prelude: |
src = <<~SRC
module M
class C
def m1(a, b)
1 + a + b
end
end
end
SRC
benchmark:
without_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
with_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..
| |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens | 21.659k| 21.303k|
| | 1.02x| -|
|with_keep_tokens | 6.220k| 5.691k|
| | 1.09x| -|
```
Assign internal_id to semantic value so that dump parsetree option
can render the tree for these codes without SEGV.
* `def m(&); end`
* `def m(*); end`
* `def m(**); end`
By this change, syntax error is recovered smaller units.
In the case below, "DEFN :bar" is same level with "CLASS :Foo"
now.
```
module Z
class Foo
foo.
end
def bar
end
end
```
[Feature #19013]
"end" after "." or "::" is treated as local variable or method,
see `EXPR_DOT_bit` for detail.
However this "changes" where `bar` method is defined. In the example
below it is not module Z but class Foo.
```
module Z
class Foo
foo.
end
def bar
end
end
```
[Feature #19013]