Fix memory leak for incomplete lambdas
[Bug #19836]
The parser does not free the chain of `struct vtable`, which causes
memory leaks.
The following script reproduces this issue:
```
10.times do
100_000.times do
Ripper.parse("-> {")
end
puts `ps -o rss= -p #{$$}`
end
```
---
parse.y | 24 ++++++++++++++----------
test/ripper/test_ripper.rb | 7 +++++++
2 files changed, 21 insertions(+), 10 deletions(-)
Fix memory leak in parser for incomplete tokens
[Bug #19835]
The parser does not free the `tbl` of the `struct vtable` when there are
leftover `lvtbl` in the parser. This causes a memory leak.
The following script reproduces this issue:
```
10.times do
100_000.times do
Ripper.parse("class Foo")
end
puts `ps -o rss= -p #{$$}`
end
```
---
parse.y | 42 ++++++++++++++++++++++++++++--------------
test/ripper/test_ripper.rb | 7 +++++++
2 files changed, 35 insertions(+), 14 deletions(-)
Handle unterminated unicode escapes in regexps
This fixes an infinite loop possible after ec3542229b.
For \u{} escapes in regexps, skip validation in the parser, and rely on the regexp
code to handle validation. This is necessary so that invalid unicode escapes in
comments in extended regexps are allowed.
Fixes [Bug #19750]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
---
parse.y | 97 ++++++++++++++++++++++++++++++++-----------------
test/ruby/test_parse.rb | 16 ++++++++
2 files changed, 79 insertions(+), 34 deletions(-)
This was introduced by b609bdeb53
to suppress warnings. However these warngins were deleted by
beae6cbf0f. Therefore these codes
are not needed anymore.
If the rescue clause has only exc_var and not exc_list, use the
exc_var position instead of the rescue body position.
This issue appears to have been introduced in
688169fd83 when "opt_list" was split
into "exc_list exc_var".
Fixes [Bug #18974]
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both `m(1)` and `m(1, )` has same AST structure other than node locations
then it's impossible to check the existence of `,` from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.
This commit adds these methods.
* Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
* Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
* Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.
[Feature #19070]
Impacts on memory usage and performance are below:
Memory usage:
```
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb
# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb
# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
```
Performance:
```
$ cat ../ast_keep_tokens.yml
prelude: |
src = <<~SRC
module M
class C
def m1(a, b)
1 + a + b
end
end
end
SRC
benchmark:
without_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
with_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..
| |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens | 21.659k| 21.303k|
| | 1.02x| -|
|with_keep_tokens | 6.220k| 5.691k|
| | 1.09x| -|
```
Assign internal_id to semantic value so that dump parsetree option
can render the tree for these codes without SEGV.
* `def m(&); end`
* `def m(*); end`
* `def m(**); end`
By this change, syntax error is recovered smaller units.
In the case below, "DEFN :bar" is same level with "CLASS :Foo"
now.
```
module Z
class Foo
foo.
end
def bar
end
end
```
[Feature #19013]
"end" after "." or "::" is treated as local variable or method,
see `EXPR_DOT_bit` for detail.
However this "changes" where `bar` method is defined. In the example
below it is not module Z but class Foo.
```
module Z
class Foo
foo.
end
def bar
end
end
```
[Feature #19013]
Invalid escapes are handled at multiple levels. The first level
is in parse.y, so skip invalid unicode escape checks for regexps
in parse.y.
Make rb_reg_preprocess and unescape_nonascii accept the regexp
options. In unescape_nonascii, if the regexp is an extended
regexp, when "#" is encountered, ignore all characters until the
end of line or end of regexp.
Unfortunately, in extended regexps, you can use "#" as a non-comment
character inside a character class, so also parse "[" and "]"
specially for extended regexps, and only skip comments if "#" is
not inside a character class. Handle nested character classes as well.
This issue doesn't just affect extended regexps, it also affects
"(#?" comments inside all regexps. So for those comments, scan
until trailing ")" and ignore content inside.
I'm not sure if there are other corner cases not handled. A
better fix would be to redesign the regexp parser so that it
unescaped during parsing instead of before parsing, so you already
know the current parsing state.
Fixes [Bug #18294]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>