Commit graph

51 commits

Author SHA1 Message Date
Earlopain
3071c5d04c [ruby/prism] Fix parser translator with trailing backslash in %W /%I array
https://docs.ruby-lang.org/en/master/syntax/literals_rdoc.html#label-25w+and+-25W-3A+String-Array+Literals
> %W allow escape sequences described in Escape Sequences. However the continuation line <newline> is not usable because it is interpreted as the escaped newline described above.

f5c7460ad5
2025-06-30 12:32:31 +00:00
Earlopain
970813d982 [ruby/prism] Fix parser translator during string escaping with invalid utf-8
Instead, prefer `scan_byte` over `get_byte` since that already returns the byte as an integer, sidestepping conversion issues.

Fixes https://github.com/ruby/prism/issues/3582

7f3008b2b5
2025-06-11 18:07:43 +00:00
Nobuyoshi Nakada
991cf2dd4d [ruby/prism] [DOC] Specify markdown mode to RDoc
12af4e144e
2025-05-29 04:45:58 +00:00
Earlopain
d7e46543b5 [ruby/prism] Fix parser translator when pinning hash with string keys
`StringNode` and `SymbolNode` don't have the same shape
(`content` vs `value`) and that wasn't handled.

I believe the logic for the common case can be reused.
I simply left the special handling for implicit nodes in pattern matching
and fall through otherwise.

NOTE: patterns.txt is not actually tested at the moment,
because it contains syntax that `parser` mistakenly rejects.
But I checked manually that this doesn't introduce other failures.
https://github.com/whitequark/parser/pull/1060

55adfaa895
2025-03-30 17:24:05 +00:00
Kevin Newton
b003d40194 Fix up merge conflicts for prism sync 2025-03-18 13:36:53 -04:00
Earlopain
a8adf5e006 [ruby/prism] Further refine string handling in the parser translator
Mostly around newlines and line continuation.
* percent arrays need special backslash handling in the ast
* Fix offset issue for heredocs with many line continuations (used wrong variable as index access)
* More refined rules on when to simplify string tokens
* Handle line continuations in squiggly heredocs
* Correctly dedent squiggly heredocs with interpolation
* Consider `':foo:` and `%s[foo]` to not be interpolation

4edfe9d981
2025-03-18 13:36:53 -04:00
Kevin Newton
0b4604d5a0 [ruby/prism] Use Set.new over to_set
422d5c4c64
2025-03-18 13:36:53 -04:00
Earlopain
ad478de3f0 [ruby/prism] Optimize array inclusion checks in the parser translator
I see `Array.include?` as 2.4% runtime. Probably because of `LPAREN_CONVERSION_TOKEN_TYPES` but
the others will be faster as well.

Also remove some inline array checks. They are specifically optimized in Ruby since 3.4, but for now prism is for >= 2.7

ca9500a3fc
2025-03-18 13:36:53 -04:00
Earlopain
d5503444fd [ruby/prism] Fix parser translator crash for certain octal escapes
`Integer#chr` performs some validation that we don't want/need. Octal escapes can go above 255, where it will then raise trying to convert.

`append_as_bytes` actually allows to pass a number, so we can just skip that call.
Although, on older rubies of course we still need to handle this in the polyfill.
I don't really like using `pack` but don't know of another way to do so.

For the utf-8 escapes, this is not an issue. Invalid utf-8 in these is simply a syntax error.

161c606b1f
2025-03-18 13:36:53 -04:00
Kevin Newton
1944247a0e [ruby/prism] Handle control and meta escapes in parser translation
09c59a3aa5
2025-03-18 13:36:53 -04:00
Earlopain
fd7a10cf4a [ruby/prism] Further refine string handling in the parser translator
Mostly around newlines and line continuation.
* percent arrays need special backslash handling in the ast
* Fix offset issue for heredocs with many line continuations (used wrong variable as index access)
* More refined rules on when to simplify string tokens
* Handle line continuations in squiggly heredocs
* Correctly dedent squiggly heredocs with interpolation
* Consider `':foo:` and `%s[foo]` to not be interpolation

4edfe9d981
2025-03-18 13:36:53 -04:00
Earlopain
5d138f2b43 [ruby/prism] Better handle regexp in the parser translator
Turns out, it was already almost correct. If you disregard \c and \M style escapes, only a single character is allowed to be escaped in a regex so most tests passed already.

There was also a mistake where the wrong value was constructed for the ast, this is now fixed.
One test fails because of this, but I'm fairly sure it is because of a parser bug. For `/\“/`, the backslash is supposed to be removed because it is a multibyte character. But tbh,
I don't entirely understand all the rules.

Fixes more than half of the remaining ast differences for rubocop tests

e1c75f304b
2025-03-18 13:36:53 -04:00
Earlopain
177adf6fa5 [ruby/prism] Fix parser translator tokens for %-arrays with whitespace escapes
Also fixes a token incompatibility for the word separator. parser only considers whitespace until the first newline

bd3dd2b62a
2025-03-18 13:36:53 -04:00
Earlopain
acf404e20e [ruby/prism] Fix an incompatibility with the parser translator
The offset cache contains an entry for each byte so it can't be accessed via the string length.

Adds tests for all variants except for this:
```
"fo
o" "ba
’"
```

For some reason, this still has the wrong offset.

a651126458
2025-03-18 13:36:53 -04:00
Earlopain
bc506295a3 [ruby/prism] Further refine string handling in the parser translator
Mostly around newlines and line continuation.
* percent arrays need special backslash handling in the ast
* Fix offset issue for heredocs with many line continuations (used wrong variable as index access)
* More refined rules on when to simplify string tokens
* Handle line continuations in squiggly heredocs
* Correctly dedent squiggly heredocs with interpolation
* Consider `':foo:` and `%s[foo]` to not be interpolation

4edfe9d981
2025-03-18 13:36:53 -04:00
Earlopain
705bd6fadb [ruby/prism] Fix parser translator when unescaping invalid utf8
1. The string starts out as binary
2. `ち` is appended, forcing it back into utf-8
3. Some invalid byte sequences are tried to append

> incompatible character encodings: UTF-8 and BINARY (ASCII-8BIT)

This makes use of my wish to use `append_as_bytes`. Unfortunatly that method is rather new
so it needs a fallback

e31e94a775
2025-03-18 13:36:53 -04:00
Kevin Newton
f2483c79fe [ruby/prism] Use Set.new over to_set
422d5c4c64
2025-03-13 14:24:48 +00:00
Earlopain
3d4c7c3802 [ruby/prism] Use reverse_each in the parser translator
Avoids an array allocation which matters more and more
the larger the file is.

I have it at 14% of runtime.

f65b90f27d
2025-03-13 13:52:45 +00:00
Earlopain
67e6ccb23f [ruby/prism] Optimize array inclusion checks in the parser translator
I see `Array.include?` as 2.4% runtime. Probably because of `LPAREN_CONVERSION_TOKEN_TYPES` but
the others will be faster as well.

Also remove some inline array checks. They are specifically optimized in Ruby since 3.4, but for now prism is for >= 2.7

ca9500a3fc
2025-03-13 13:52:45 +00:00
Earlopain
56242ba495 Better handle regexp in the parser translator
Turns out, it was already almost correct. If you disregard \c and \M style escapes, only a single character is allowed to be escaped in a regex so most tests passed already.

There was also a mistake where the wrong value was constructed for the ast, this is now fixed.
One test fails because of this, but I'm fairly sure it is because of a parser bug. For `/\“/`, the backslash is supposed to be removed because it is a multibyte character. But tbh,
I don't entirely understand all the rules.

Fixes more than half of the remaining ast differences for rubocop tests
2025-01-14 20:33:11 +00:00
Earlopain
7c0b92a1c6 [ruby/prism] Fix parser translator tokens for %x(#{})
It falsely considered it to be a single backtick command

dd762be590
2025-01-13 18:02:28 +00:00
Earlopain
0a26a3de89 [ruby/prism] Fix parser translator heredoc dedention with leading interpolation
```rb
<<~F
  foo
#{}
  bar
F
```

has zero common whitespace.

1f3c222a06
2025-01-13 15:43:24 +00:00
Earlopain
566f9463c2 [ruby/prism] Fix parser translator tSPACE tokens for percent arrays
Tests worked around this but the incompatibility is not hard to fix.
This fixes 17 token incompatibilies in tests here that were previously passing

101962526d
2025-01-12 18:54:16 +00:00
Earlopain
48749afe61 [ruby/prism] Fix parser translator ranges for mulltiline strings with multiline bytes
A rather silly issue with a rather simple fix.
The ranges already use the offset cache, this effectivly double-encoded them.

66b65634c0
2025-01-12 18:34:36 +00:00
Earlopain
d0deec3ef3 [ruby/prism] Fix parser translator tokens for comment-only file
In https://github.com/ruby/prism/pull/3393 I made a mistake.
When there is no previous token, it wraps around to -1. Oops

Additionally, if a comment has no newline then the offset should be kept as is

3c266f1de4
2025-01-12 18:34:06 +00:00
Earlopain
70022224b2 [ruby/prism] Better comment token handling for the parser translator
There appear to be a bunch of rules, changing behaviour for
inline comments, multiple comments after another, etc.

This seems to line up with reality pretty closely, token differences for RuboCop tests go from 1129 to 619 which seems pretty impressive

2e1b92670c
2025-01-11 19:09:05 -05:00
Earlopain
9c962ea792 [ruby/prism] Fix parser translator tokens for backslashes in single-quoted strings and word arrays
These are not line continuations. They either should be taken literally,
or allow the word array to contain the following whitespace (newlines in this case)

Before:
```
  0...1: tSTRING_BEG     => "'"
 1...12: tSTRING_CONTENT => "foobar\\\n"
12...16: tSTRING_CONTENT => "baz\n"
16...17: tSTRING_END     => "'"
17...18: tNL             => nil
```

After:
```
  0...1: tSTRING_BEG     => "'"
  1...6: tSTRING_CONTENT => "foo\\\n"
 6...12: tSTRING_CONTENT => "bar\\\n"
12...16: tSTRING_CONTENT => "baz\n"
16...17: tSTRING_END     => "'"
17...18: tNL             => nil
```

b6554ad64e
2025-01-11 19:09:05 -05:00
Earlopain
110461c509 [ruby/prism] Implement more string token escaping in the parser translator
This leaves `\c` and `\M` escaping but I don't understand how these should even work yet. Maybe later.

13db3e8cb9
2025-01-11 19:09:05 -05:00
Earlopain
7cbaa3b929 [ruby/prism] Fix an incompatibility with the parser translator
The offset cache contains an entry for each byte so it can't be accessed via the string length.

Adds tests for all variants except for this:
```
"fo
o" "ba
’"
```

For some reason, this still has the wrong offset.

a651126458
2025-01-11 19:09:05 -05:00
Koichi ITO
75ed086348 [ruby/prism] Fix kDO_LAMBDA token incompatibility for Prism::Translation::Parser::Lexer
## Summary

This PR fixes `kDO_LAMBDA` token incompatibility between Parser gem and `Prism::Translation::Parser` for lambda `do` block.

### Parser gem (Expected)

Returns `kDO_LAMBDA` token:

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> do end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:kDO_LAMBDA, ["do", #<Parser::Source::Range example.rb 3...5>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 6...9>]]]
```

### `Prism::Translation::Parser` (Actual)

Previously, the parser returned `kDO` token when parsing the following:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:kDO, ["do", #<Parser::Source::Range example.rb 3...5>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 6...9>]]]
```

After the update, the parser now returns `kDO_LAMBDA` token for the same input:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:kDO_LAMBDA, ["do", #<Parser::Source::Range example.rb 3...5>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 6...9>]]]
```

## Additional Information

Unfortunately, this kind of edge case doesn't work as expected; `kDO` is returned instead of `kDO_LAMBDA`.
However, since `kDO` is already being returned in this case, there is no change in behavior.

### Parser gem

Returns `tLAMBDA` token:

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> (foo = -> (bar) {}) do end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 4...7>]], [:tEQL, ["=", #<Parser::Source::Range example.rb 8...9>]],
[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 10...12>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 13...14>]],
[:tIDENTIFIER, ["bar", #<Parser::Source::Range example.rb 14...17>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 17...18>]],
[:tLAMBEG, ["{", #<Parser::Source::Range example.rb 19...20>]], [:tRCURLY, ["}", #<Parser::Source::Range example.rb 20...21>]],
[:tRPAREN, [")", #<Parser::Source::Range example.rb 21...22>]], [:kDO_LAMBDA, ["do", #<Parser::Source::Range example.rb 23...25>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 26...29>]]]
```

### `Prism::Translation::Parser`

Returns `kDO` token:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-> (foo = -> (bar) {}) do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-darwin23]
[[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 0...2>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 4...7>]], [:tEQL, ["=", #<Parser::Source::Range example.rb 8...9>]],
[:tLAMBDA, ["->", #<Parser::Source::Range example.rb 10...12>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 13...14>]],
[:tIDENTIFIER, ["bar", #<Parser::Source::Range example.rb 14...17>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 17...18>]],
[:tLAMBEG, ["{", #<Parser::Source::Range example.rb 19...20>]], [:tRCURLY, ["}", #<Parser::Source::Range example.rb 20...21>]],
[:tRPAREN, [")", #<Parser::Source::Range example.rb 21...22>]], [:kDO, ["do", #<Parser::Source::Range example.rb 23...25>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 26...29>]]]
```

As the intention is not to address such special cases at this point, a comment has been left indicating that this case still returns `kDO`.
In other words, `kDO_LAMBDA` will now be returned except for edge cases after this PR.

2ee480654c
2024-09-20 17:17:21 +00:00
Koichi ITO
7a65334528 [ruby/prism] Fix a token incompatibility for Prism::Translation::Parser::Lexer
This PR fixes a token incompatibility between Parser gem and `Prism::Translation::Parser` for double splat argument.

## Parser gem (Expected)

Returns `tDSTAR` token:

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "def f(**foo) end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:kDEF, ["def", #<Parser::Source::Range example.rb 0...3>]], [:tIDENTIFIER, ["f", #<Parser::Source::Range example.rb 4...5>]],
[:tLPAREN2, ["(", #<Parser::Source::Range example.rb 5...6>]], [:tDSTAR, ["**", #<Parser::Source::Range example.rb 6...8>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 8...11>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 11...12>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 13...16>]]]
```

## `Prism::Translation::Parser` (Actual)

Previously, the parser returned `tPOW` token when parsing the following:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "def f(**foo) end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:kDEF, ["def", #<Parser::Source::Range example.rb 0...3>]], [:tIDENTIFIER, ["f", #<Parser::Source::Range example.rb 4...5>]],
[:tLPAREN2, ["(", #<Parser::Source::Range example.rb 5...6>]], [:tPOW, ["**", #<Parser::Source::Range example.rb 6...8>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 8...11>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 11...12>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 13...16>]]]
```

After the update, the parser now returns `tDSTAR` token for the same input:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "def f(**foo) end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:kDEF, ["def", #<Parser::Source::Range example.rb 0...3>]], [:tIDENTIFIER, ["f", #<Parser::Source::Range example.rb 4...5>]],
[:tLPAREN2, ["(", #<Parser::Source::Range example.rb 5...6>]], [:tDSTAR, ["**", #<Parser::Source::Range example.rb 6...8>]],
[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 8...11>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 11...12>]],
[:kEND, ["end", #<Parser::Source::Range example.rb 13...16>]]]
```

With this change, the following code could be removed from test/prism/ruby/parser_test.rb:

```diff
-          when :tPOW
-            actual_token[0] = expected_token[0] if expected_token[0] == :tDSTAR
```

`tPOW` is the token type for the behavior of `a ** b`, and its behavior remains unchanged:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "a ** b"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tIDENTIFIER, ["a", #<Parser::Source::Range example.rb 0...1>]], [:tPOW, ["**", #<Parser::Source::Range example.rb 2...4>]],
[:tIDENTIFIER, ["b", #<Parser::Source::Range example.rb 5...6>]]]
```

66bde35a44
2024-09-09 19:01:30 +00:00
Koichi ITO
4774284124 [ruby/prism] Fix a token incompatibility for Prism::Translation::Parser::Lexer
This PR fixes a token incompatibility between Parser gem and `Prism::Translation::Parser` for left parenthesis.

## Parser gem (Expected)

Returns `tLPAREN2` token:

```console
$ bundle exec ruby -Ilib -rparser/ruby33 \
-ve 'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "foo(:bar)"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 0...3>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tSYMBOL, ["bar", #<Parser::Source::Range example.rb 4...8>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 8...9>]]]
```

## `Prism::Translation::Parser` (Actual)

Previously, the parser returned `tLPAREN` token when parsing the following:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "foo(:bar)"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 0...3>]], [:tLPAREN, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tSYMBOL, ["bar", #<Parser::Source::Range example.rb 4...8>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 8...9>]]]
```

After the update, the parser now returns `tLPAREN2` token for the same input:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "foo(:bar)"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tIDENTIFIER, ["foo", #<Parser::Source::Range example.rb 0...3>]], [:tLPAREN2, ["(", #<Parser::Source::Range example.rb 3...4>]],
[:tSYMBOL, ["bar", #<Parser::Source::Range example.rb 4...8>]], [:tRPAREN, [")", #<Parser::Source::Range example.rb 8...9>]]]
```

The `PARENTHESIS_LEFT` token in Prism is classified as either `tLPAREN` or `tLPAREN2` in the Parser gem.
The tokens that were previously all classified as `tLPAREN` are now also classified to `tLPAREN2`.

With this change, the following code could be removed from `test/prism/ruby/parser_test.rb`:

```diff
-          when :tLPAREN
-            actual_token[0] = expected_token[0] if expected_token[0] == :tLPAREN2
```

04d6f3478d
2024-09-07 22:36:38 +00:00
Kevin Newton
24f48382bc [ruby/prism] (parser) Fix up tokens for empty symbol
5985ab7687
2024-06-18 21:18:39 -04:00
Koichi ITO
3605d6076d [ruby/prism] Fix token incompatibility for Prism::Translation::Parser::Lexer
This PR fixes token incompatibility for `Prism::Translation::Parser::Lexer` when using backquoted heredoc indetiner:

```ruby
<<-`  FOO`
a
b
     FOO
```

## Parser gem (Expected)

Returns `tXSTRING_BEG` as the first token:

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = File.read("example.rb"); p Parser::Ruby33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:xstr,
  s(:str, "a\n"),
  s(:str, "b\n")), [], [[:tXSTRING_BEG, ["<<`", #<Parser::Source::Range example.rb 0...10>]],
[:tSTRING_CONTENT, ["a\n", #<Parser::Source::Range example.rb 11...13>]],
[:tSTRING_CONTENT, ["b\n", #<Parser::Source::Range example.rb 13...15>]],
[:tSTRING_END, ["  FOO", #<Parser::Source::Range example.rb 15...23>]], [:tNL, [nil, #<Parser::Source::Range example.rb 10...11>]]]]
```

## `Prism::Translation::Parser` (Actual)

Previously, the tokens returned by the Parser gem were different. The escaped backslash does not match in the `tSTRING_BEG` token and
value of `tSTRING_END` token.

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = File.read("example.rb"); p Prism::Translation::Parser33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:xstr,
  s(:str, "a\n"),
  s(:str, "b\n")), [], [[:tSTRING_BEG, ["<<\"", #<Parser::Source::Range example.rb 0...10>]],
[:tSTRING_CONTENT, ["a\n", #<Parser::Source::Range example.rb 11...13>]],
[:tSTRING_CONTENT, ["b\n", #<Parser::Source::Range example.rb 13...15>]],
[:tSTRING_END, ["`  FOO`", #<Parser::Source::Range example.rb 15...23>]], [:tNL, [nil, #<Parser::Source::Range example.rb 10...11>]]]]
```

After this correction, the AST and tokens returned by the Parser gem are the same:

```console
$ bunlde exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = File.read("example.rb"); p Prism::Translation::Parser33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:xstr,
  s(:str, "a\n"),
  s(:str, "b\n")), [], [[:tXSTRING_BEG, ["<<`", #<Parser::Source::Range example.rb 0...10>]],
[:tSTRING_CONTENT, ["a\n", #<Parser::Source::Range example.rb 11...13>]],
[:tSTRING_CONTENT, ["b\n", #<Parser::Source::Range example.rb 13...15>]],
[:tSTRING_END, ["  FOO", #<Parser::Source::Range example.rb 15...23>]], [:tNL, [nil, #<Parser::Source::Range example.rb 10...11>]]]]
```

308f8d85a1
2024-03-16 17:55:38 +00:00
Koichi ITO
e3a82d79fd [ruby/prism] Fix token incompatibility for Prism::Translation::Parser::Lexer
This PR fixes token incompatibility for `Prism::Translation::Parser::Lexer`
when using escaped backslash in string literal:

```ruby
"\\ foo \\ bar"
```

## Parser gem (Expected)

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = File.read("example.rb"); p Parser::Ruby33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:str, "\\ foo \\ bar"), [], [[:tSTRING, ["\\ foo \\ bar", #<Parser::Source::Range example.rb 0...15>]],
[:tNL, [nil, #<Parser::Source::Range example.rb 15...16>]]]]
```

## `Prism::Translation::Parser` (Actual)

Previously, the tokens returned by the Parser gem were different. The escaped backslash does not match in the `tSTRING` token:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = File.read("example.rb"); p Prism::Translation::Parser33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:str, "\\ foo \\ bar"), [], [[:tSTRING, ["\\\\ foo \\\\ bar", #<Parser::Source::Range example.rb 0...15>]],
[:tNL, [nil, #<Parser::Source::Range example.rb 15...16>]]]]
```

After this correction, the AST and tokens returned by the Parser gem are the same:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = File.read("example.rb"); p Prism::Translation::Parser33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:str, "\\ foo \\ bar"), [], [[:tSTRING, ["\\ foo \\ bar", #<Parser::Source::Range example.rb 0...15>]],
[:tNL, [nil, #<Parser::Source::Range example.rb 15...16>]]]]
```

The reproduction test is based on the following strings.txt and exists:
https://github.com/ruby/prism/blob/v0.24.0/test/prism/fixtures/strings.txt#L79

But, the restoration has not yet been performed due to remaining other issues in strings.txt.

2c44e7e307
2024-03-15 18:08:39 +00:00
Koichi ITO
c9da8d67fd [ruby/prism] Fix a token incompatibility for Prism::Translation::Parser::Lexer
This PR fixes a token incompatibility between Parser gem and `Prism::Translation::Parser`
for the heredocs_leading_whitespace.txt test.

7d45fb1eed
2024-03-15 18:07:59 +00:00
Koichi ITO
c0b8dee95a [ruby/prism] Fix an AST and token incompatibility for Prism::Translation::Parser
This PR fixes an AST and token incompatibility between Parser gem and `Prism::Translation::Parser`
for dstring literal:

```ruby
"foo
  #{bar}"
```

## Parser gem (Expected)

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = File.read("example.rb"); p Parser::Ruby33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:dstr,
  s(:str, "foo\n"),
  s(:str, "  "),
  s(:begin,
    s(:send, nil, :bar))), [], [[:tSTRING_BEG, ["\"", #<Parser::Source::Range example.rb 0...1>]],
[:tSTRING_CONTENT, ["foo\n", #<Parser::Source::Range example.rb 1...5>]],
[:tSTRING_CONTENT, ["  ", #<Parser::Source::Range example.rb 5...7>]],
[:tSTRING_DBEG, ["\#{", #<Parser::Source::Range example.rb 7...9>]],
[:tIDENTIFIER, ["bar", #<Parser::Source::Range example.rb 9...12>]],
[:tSTRING_DEND, ["}", #<Parser::Source::Range example.rb 12...13>]],
[:tSTRING_END, ["\"", #<Parser::Source::Range example.rb 13...14>]], [:tNL, [nil, #<Parser::Source::Range example.rb 14...15>]]]]
```

## `Prism::Translation::Parser` (Actual)

Previously, the AST and tokens returned by the Parser gem were different. In this case, `dstr` node should not be nested:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = File.read("example.rb"); p Prism::Translation::Parser33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:dstr,
  s(:dstr,
    s(:str, "foo\n"),
    s(:str, "  ")),
  s(:begin,
    s(:send, nil, :bar))), [], [[:tSTRING_BEG, ["\"", #<Parser::Source::Range example.rb 0...1>]],
[:tSTRING_CONTENT, ["foo\n", #<Parser::Source::Range example.rb 1...5>]],
[:tSTRING_CONTENT, ["  ", #<Parser::Source::Range example.rb 5...7>]],
[:tSTRING_DBEG, ["\#{", #<Parser::Source::Range example.rb 7...9>]],
[:tIDENTIFIER, ["bar", #<Parser::Source::Range example.rb 9...12>]],
[:tSTRING_DEND, ["}", #<Parser::Source::Range example.rb 12...13>]],
[:tSTRING_END, ["\"", #<Parser::Source::Range example.rb 13...14>]], [:tNL, [nil, #<Parser::Source::Range example.rb 14...15>]]]]
```

After this correction, the AST and tokens returned by the Parser gem are the same:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = File.read("example.rb"); p Prism::Translation::Parser33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:dstr,
  s(:str, "foo\n"),
  s(:str, "  "),
  s(:begin,
    s(:send, nil, :bar))), [], [[:tSTRING_BEG, ["\"", #<Parser::Source::Range example.rb 0...1>]],
[:tSTRING_CONTENT, ["foo\n", #<Parser::Source::Range example.rb 1...5>]],
[:tSTRING_CONTENT, ["  ", #<Parser::Source::Range example.rb 5...7>]],
[:tSTRING_DBEG, ["\#{", #<Parser::Source::Range example.rb 7...9>]],
[:tIDENTIFIER, ["bar", #<Parser::Source::Range example.rb 9...12>]],
[:tSTRING_DEND, ["}", #<Parser::Source::Range example.rb 12...13>]],
[:tSTRING_END, ["\"", #<Parser::Source::Range example.rb 13...14>]], [:tNL, [nil, #<Parser::Source::Range example.rb 14...15>]]]]
```

c1652a9ee7
2024-03-15 12:31:40 +00:00
Koichi ITO
0f076fa520 [ruby/prism] Fix an AST and token incompatibility for Prism::Translation::Parser
This PR fixes an AST and token incompatibility between Parser gem and `Prism::Translation::Parser`
for empty xstring literal.

## Parser gem (Expected)

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("/tmp/s.rb"); buf.source = "``"; p Parser::Ruby33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:xstr), [], [[:tXSTRING_BEG, ["`", #<Parser::Source::Range /tmp/s.rb 0...1>]],
[:tSTRING_END, ["`", #<Parser::Source::Range /tmp/s.rb 1...2>]]]]
```

## `Prism::Translation::Parser` (Actual)

Previously, the AST and tokens returned by the Parser gem were different:

```console
$ bunele exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("/tmp/s.rb"); buf.source = "``"; p Prism::Translation::Parser33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:xstr, s(:str, "")), [], [[:tBACK_REF2, ["`", #<Parser::Source::Range /tmp/s.rb 0...1>]],
[:tSTRING_END, ["`", #<Parser::Source::Range /tmp/s.rb 1...2>]]]]
```

After this correction, the AST and tokens returned by the Parser gem are the same:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("/tmp/s.rb"); buf.source = "``"; p Prism::Translation::Parser33.new.tokenize(buf)'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[s(:xstr), [], [[:tXSTRING_BEG, ["`", #<Parser::Source::Range /tmp/s.rb 0...1>]],
[:tSTRING_END, ["`", #<Parser::Source::Range /tmp/s.rb 1...2>]]]]
```

4ac89dcbb5
2024-03-13 16:02:10 +00:00
Koichi ITO
f8cab4ef8e [ruby/prism] Fix a token incompatibility for Prism::Translation::Parser::Lexer
In practice, the `BACKTICK` is mapped either as `:tXSTRING_BEG` or `:tBACK_REF2`.
The former is used in xstrings like `` `foo` ``, while the latter is utilized as
a back reference in contexts like `` A::` ``.
This PR will make corrections to differentiate the use of `BACKTICK`.

This mistake was discovered through the investigation of xstring.txt file.
The PR will run tests from xstring.txt file except for `` `f\oo` ``, which will still fail,
hence it will be separated into xstring_with_backslash.txt file.
This separation will facilitate addressing the correction at a different time.

49ad8df40a
2024-03-12 17:33:14 +00:00
Kevin Newton
1e886c040a [ruby/prism] Fix some whitequark/parser lexer compatibilities
34e521d071
2024-03-12 15:32:25 +00:00
Koichi ITO
c637f53edd [ruby/prism] Fix a token incompatibility for Prism::Translation::Parser::Lexer
This PR fixes a token incompatibility between Parser gem and `Prism::Translation::Parser` for `tBACK_REF2`:

## Parser gem (Expected)

Returns `tBACK_REF2` token:

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "A::`"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[[:tCONSTANT, ["A", #<Parser::Source::Range example.rb 0...1>]], [:tCOLON2, ["::", #<Parser::Source::Range example.rb 1...3>]],
[:tBACK_REF2, ["`", #<Parser::Source::Range example.rb 3...4>]]]
```

## `Prism::Translation::Parser` (Actual)

Previously, the parser returned `tXSTRING_BEG` token when parsing the following:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "A::`"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[[:tCONSTANT, ["A", #<Parser::Source::Range example.rb 0...1>]], [:tCOLON2, ["::", #<Parser::Source::Range example.rb 1...3>]],
[:tXSTRING_BEG, ["`", #<Parser::Source::Range example.rb 3...4>]]]
```

After the update, the parser now returns `tBACK_REF2` token for the same input:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "A::`"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[[:tCONSTANT, ["A", #<Parser::Source::Range example.rb 0...1>]], [:tCOLON2, ["::", #<Parser::Source::Range example.rb 1...3>]],
[:tBACK_REF2, ["`", #<Parser::Source::Range example.rb 3...4>]]]
```

This correction enables the restoration of `constants.txt` as a test case.

7f63b28f98
2024-03-12 15:17:20 +00:00
Koichi ITO
72a57076e2 [ruby/prism] Fix a token incompatibility for Prism::Translation::Parser::Lexer
This PR fixes a token incompatibility between Parser gem and `Prism::Translation::Parser` for
beginless range:

## Parser gem (Expected)

Returns `tBDOT2` token:

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "..42"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[[:tBDOT2, ["..", #<Parser::Source::Range example.rb 0...2>]], [:tINTEGER, [42, #<Parser::Source::Range example.rb 2...4>]]]
```

## `Prism::Translation::Parser` (Actual)

Previously, the parser returned `tDOT2` token when parsing the following:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "..42"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[[:tDOT2, ["..", #<Parser::Source::Range example.rb 0...2>]], [:tINTEGER, [42, #<Parser::Source::Range example.rb 2...4>]]]
```

After the update, the parser now returns `tBDOT2` token for the same input:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "..42"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-darwin22]
[[:tBDOT2, ["..", #<Parser::Source::Range example.rb 0...2>]], [:tINTEGER, [42, #<Parser::Source::Range example.rb 2...4>]]]
```

This correction enables the restoration of `endless_range_in_conditional.txt` as a test case.

f624b99ab0
2024-03-12 14:15:34 +00:00
Koichi ITO
2af6bc26c5 [ruby/prism] Fix an AST and token incompatibility for Prism::Translation::Parser
Fixes https://github.com/ruby/prism/pull/2515.

This PR fixes an AST and token incompatibility between Parser gem and `Prism::Translation::Parser`
for string literal with line breaks.

c58466e5bf
2024-03-12 10:45:56 +00:00
Koichi ITO
dd24d88473 [ruby/prism] Fix a token incompatibility for Prism::Translation::Parser
Fixes https://github.com/ruby/prism/pull/2512.

This PR fixes a token incompatibility between Parser gem and `Prism::Translation::Parser`
for `HEREDOC_END` with a newline.

b67d1e0c6f
2024-03-08 19:07:53 +00:00
Koichi ITO
af8a4205bf Fix an error for Prism::Translation::Parser::Lexer
This PR fixes the following error for `Prism::Translation::Parser::Lexer` on the main branch:

```console
$ cat example.rb
'a' # aあ
"
#{x}
"

$ bundle exec rubocop
Parser::Source::Range: end_pos must not be less than begin_pos
/Users/koic/.rbenv/versions/3.0.4/lib/ruby/gems/3.0.0/gems/parser-3.3.0.5/lib/parser/source/range.rb:39:in `initialize'
/Users/koic/src/github.com/ruby/prism/lib/prism/translation/parser/lexer.rb:299:in `new'
/Users/koic/src/github.com/ruby/prism/lib/prism/translation/parser/lexer.rb:299:in `block in to_a'
/Users/koic/src/github.com/ruby/prism/lib/prism/translation/parser/lexer.rb:297:in `map'
/Users/koic/src/github.com/ruby/prism/lib/prism/translation/parser/lexer.rb:297:in `to_a'
/Users/koic/src/github.com/ruby/prism/lib/prism/translation/parser.rb:263:in `build_tokens'
/Users/koic/src/github.com/ruby/prism/lib/prism/translation/parser.rb:92:in `tokenize'
```

This change was made in https://github.com/ruby/prism/pull/2557, and it seems there was
an inconsistency in Range due to forgetting to apply `offset_cache` to `start_offset`.
2024-03-08 12:33:02 +00:00
Koichi ITO
0e4bfd08e5 [ruby/prism] Fix an AST and token incompatibility for Prism::Translation::Parser
Fixes https://github.com/ruby/prism/pull/2506.

This PR fixes an AST and token incompatibility between Parser gem and `Prism::Translation::Parser`
for symbols quoted with line breaks.

06ab4df8cd
2024-03-07 14:05:20 +00:00
Kevin Newton
d266b71467 [ruby/prism] Use the diagnostic types in the parser translation layer
1a8a0063dc
2024-03-06 21:42:54 -05:00
Ufuk Kayserilioglu
8dfe0c7c28 [ruby/prism] Fix some type-checking errors by using different method calls
For example, use `.fetch` or `.dig` instead of `[]`, and use `===` instead of `is_a?` for checking types of objects.

548b54915f
2024-03-06 21:37:52 +00:00
Kevin Newton
5856ea3fd1 [ruby/prism] Fix up some minor parser incompatibilities
c6c771d1fa
2024-03-04 14:39:52 +00:00
Max Prokopiev
f012ce0d18 [ruby/prism] Fix lexing of foo! when it's a first thing to parse
7597aca76a
2024-02-16 14:08:56 +00:00