Commit graph

501 commits

Author SHA1 Message Date
Aaron Patterson
89d89fa49d When reading from stdin, put a wrapper around the IO object
The purpose of this commit is to fix Bug #21188.  We need to detect when
stdin has run in to an EOF case.  Unfortunately we can't _call_ the eof
function on IO because it will block.

Here is a short script to demonstrate the issue:

```ruby
x = STDIN.gets
puts x
puts x.eof?
```

If you run the script, then type some characters (but _NOT_ a newline),
then hit Ctrl-D twice, it will print the input string.  Unfortunately,
calling `eof?` will try to read from STDIN again causing us to need a
3rd Ctrl-D to exit the program.

Before introducing the EOF callback to Prism, the input loop looked
kind of like this:

```ruby
loop do
  str = STDIN.gets
  process(str)

  if str.nil?
    p :DONE
  end
end
```

Which required 3 Ctrl-D to exit.  If we naively changed it to something
like this:

```ruby
loop do
  str = STDIN.gets
  process(str)

  if STDIN.eof?
    p :DONE
  end
end
```

It would still require 3 Ctrl-D because `eof?` would block.  In this
patch, we're wrapping the IO object, checking the buffer for a newline
and length, and then using that to simulate a non-blocking eof? method.

This commit wraps STDIN and emulates a non-blocking `eof` function.

[Bug #21188]
2025-08-04 12:34:33 -07:00
Justin Collins
885862a853 [ruby/prism] Match RubyParser behavior for -> lambda args
9f55551b09
2025-08-01 16:57:17 +00:00
Justin Collins
d289eb2723 [ruby/prism] RubyParser translation for stabby lambdas with it
c2e372a8d8
2025-08-01 16:57:17 +00:00
Earlopain
026079925c [ruby/prism] Do not use 0 to indicate the latest ruby version to parse
This makes it hard to do version checks against this value. The current version checks work because there are so few possible values at the moment.

As an example, PR 3337 introduces new syntax for ruby 3.5 and uses `PM_OPTIONS_VERSION_LATEST` as its version guard. Because what is considered the latest changes every year, it must later be changed to `parser->version == parser->version == PM_OPTIONS_VERSION_CRUBY_3_5 || parser->version == PM_OPTIONS_VERSION_LATEST`, with one extra version each year.

With this change, the PR can instead write `parser->version >= PM_OPTIONS_VERSION_CRUBY_3_5` which is self-explanatory
and works for future versions.

8318a113ca
2025-07-29 17:17:28 +00:00
Earlopain
3071c5d04c [ruby/prism] Fix parser translator with trailing backslash in %W /%I array
https://docs.ruby-lang.org/en/master/syntax/literals_rdoc.html#label-25w+and+-25W-3A+String-Array+Literals
> %W allow escape sequences described in Escape Sequences. However the continuation line <newline> is not usable because it is interpreted as the escaped newline described above.

f5c7460ad5
2025-06-30 12:32:31 +00:00
Earlopain
970813d982 [ruby/prism] Fix parser translator during string escaping with invalid utf-8
Instead, prefer `scan_byte` over `get_byte` since that already returns the byte as an integer, sidestepping conversion issues.

Fixes https://github.com/ruby/prism/issues/3582

7f3008b2b5
2025-06-11 18:07:43 +00:00
Nobuyoshi Nakada
5e64d5c7d9 [ruby/prism] [DOC] Markup __FILE__ as code, not emphasis
571ba378f5
2025-05-29 04:45:59 +00:00
Nobuyoshi Nakada
82c74e4282 [ruby/prism] [DOC] Stop rdoc from processing non-rdoc comments
de1faa1680
2025-05-29 04:45:59 +00:00
Nobuyoshi Nakada
22451f370e [ruby/prism] [DOC] Add code fences
641775e5fe
2025-05-29 04:45:58 +00:00
Nobuyoshi Nakada
991cf2dd4d [ruby/prism] [DOC] Specify markdown mode to RDoc
12af4e144e
2025-05-29 04:45:58 +00:00
viralpraxis
543dd77cc3 [ruby/prism] Fix parsing rescued exception via indexed assignment
Given this code

```ruby
begin
  raise '42'
rescue => A[]
end
```

Prism fails with this backtrace

```
Error: test_unparser/corpus/literal/rescue.txt(Prism::ParserTest): NoMethodError: undefined method `arguments' for nil
prism/lib/prism/translation/parser/compiler.rb:1055:in `visit_index_target_node'
prism/lib/prism/node.rb:9636:in `accept'
prism/lib/prism/compiler.rb:30:in `visit'
prism/lib/prism/translation/parser/compiler.rb:218:in `visit_begin_node'
```

Seems like

```diff
-            visit_all(node.arguments.arguments),
+            visit_all(node.arguments&.arguments || []),
```

fixes the problem.

76d01aeb6c
2025-04-12 17:43:57 +00:00
Earlopain
334c261cc9 [ruby/prism] Fix parser translator when splatting in pattern matching pin
Because it ends up treating it as a local variable, and `a.x`
is not a valid local variable name.

I'm not big on pattern matching, but conceptually it makes sense to me
to treat anything inside ^() to not be
pattern matching syntax?

80dbd85c45
2025-04-02 20:51:54 +00:00
Earlopain
d7e46543b5 [ruby/prism] Fix parser translator when pinning hash with string keys
`StringNode` and `SymbolNode` don't have the same shape
(`content` vs `value`) and that wasn't handled.

I believe the logic for the common case can be reused.
I simply left the special handling for implicit nodes in pattern matching
and fall through otherwise.

NOTE: patterns.txt is not actually tested at the moment,
because it contains syntax that `parser` mistakenly rejects.
But I checked manually that this doesn't introduce other failures.
https://github.com/whitequark/parser/pull/1060

55adfaa895
2025-03-30 17:24:05 +00:00
Kevin Newton
052794bfe1 [ruby/prism] Accept a newline after the defined? keyword
[Bug #21197]

22be955ce9
2025-03-30 13:22:41 -04:00
Kevin Newton
4b1fea81f9 [ruby/prism] Update Ruby deps
594e2a69ed
2025-03-23 22:16:45 +00:00
Earlopain
c49051eaa8 [ruby/prism] Enforce a minimum parser version for the parser translator
There hasn't been much that would actually affect parsers usage of it.
But, when adding new node types, these usually appear in the `Parser::Meta::NODE_TYPES`.

`itblock` was added, gets emitted by prism, and then `rubocop-ast` blindly delegates to `on_itblock`.
These methods are dynamically created through `NODE_TYPES`, which means that it will error if it
doesn't contain `itblock`.

This is unfortunate because in `rubocop-ast` these methods are eagerly defined but
the prism translator is lazily loaded on demand.
The simplest solution is to add them on the `parser` side (even if they are not emitted directly), and require that a version that contains those be used.

In summary when adding a new node type:
* Add it to `Parser::Meta::PRISM_TRANSLATION_PARSER_NODE_TYPES` (gets included in `NODE_TYPES`)
* Bump the minimum `parser` version used by `prism` to a version that contains the above change
* Actually emit that node type in `prism`

d73783d065
2025-03-22 17:08:42 +00:00
Earlopain
9b5165b1d7 [ruby/prism] Don't use RUBY_VERSION.to_f
There will be a bunch of other problems should 3.10 ever exists, but I guess why not fix this one now.

b385f47f8b
2025-03-21 11:18:33 +00:00
Earlopain
ab8b199be8 [ruby/prism] Add Prism::Translation::ParserCurrent
It's not my favorite api but for users that currently use the same thing
from `parser`, moving over is more difficult
than it needs to be.

If you plan to support both old and new ruby versions, you definitly need to
branch somewhere on the ruby version
to either choose prism or parser.
But with prism you then need to enumerate all the versions again and choose the correct one.

Also, don't recommend to use `Prism::Translation::Parser` in docs. It's version-less
but actually always just uses Ruby 3.4 which is probably
not what the user intended.

Note: parser also warns when the patch version doesn't match what it expects. But I don't think prism has such a concept,
and anyways it would require releases anytime ruby releases, which I don't think is very desirable

77177f9e92
2025-03-20 21:20:23 +00:00
Kevin Newton
641f15b1c6 [ruby/prism] Mark Prism as ractor-safe
c02429765b
2025-03-19 21:11:57 +00:00
Kevin Newton
050ffab82b [ruby/prism] Polyfill Kernel#warn category parameter
d85c72a1b9
2025-03-19 21:03:18 +00:00
Earlopain
b5e9a2da4c [ruby/prism] Remove category keyword from warn call
`category` is only supported from Ruby 3.0 onwards and prism can still run with Ruyb 2.7

335a193851
2025-03-19 21:03:17 +00:00
Earlopain
e5e160475b [ruby/prism] Warn when the parser translator receives an incompatible builder class
In https://github.com/ruby/prism/pull/3494 I added a bit of code
so that using the new builder doesn't break stuff.
This code can be dropped when it is enforced that builder
is _always_ the correct subclass (and makes future issues like that unlikely).

193d4b806d
2025-03-19 21:03:17 +00:00
Kevin Newton
6e9568d202 [ruby/prism] Bump to v1.4.0
71d31db496
2025-03-18 19:06:34 +00:00
Kevin Newton
b003d40194 Fix up merge conflicts for prism sync 2025-03-18 13:36:53 -04:00
Earlopain
90d38ddb47 [ruby/prism] Fix merge mishap
Caused by https://github.com/ruby/prism/pull/3478 and https://github.com/ruby/prism/pull/3443

I also made the builder reference more explicit to clearly distinquish
between `::Parser` and `Prism::Translation::Parser`

d52aaa75b6
2025-03-18 13:36:53 -04:00
Earlopain
e3c8464630 [ruby/prism] Only unnest parser mlhs nodes when no rest argument is provided
```
(a,), = []

PARSER====================
s(:masgn,
  s(:mlhs,
    s(:mlhs,
      s(:lvasgn, :a))),
  s(:array))
PRISM====================
s(:masgn,
  s(:mlhs,
    s(:lvasgn, :a)),
  s(:array))
```

8aa1f4690e
2025-03-18 13:36:53 -04:00
Earlopain
94e12ffa39 [ruby/prism] Fix parser translator multiline interpolated symbols
In 2637007929 I added tests but didn't modify them correctly

de021e74de
2025-03-18 13:36:53 -04:00
Earlopain
a8adf5e006 [ruby/prism] Further refine string handling in the parser translator
Mostly around newlines and line continuation.
* percent arrays need special backslash handling in the ast
* Fix offset issue for heredocs with many line continuations (used wrong variable as index access)
* More refined rules on when to simplify string tokens
* Handle line continuations in squiggly heredocs
* Correctly dedent squiggly heredocs with interpolation
* Consider `':foo:` and `%s[foo]` to not be interpolation

4edfe9d981
2025-03-18 13:36:53 -04:00
Kevin Newton
0b4604d5a0 [ruby/prism] Use Set.new over to_set
422d5c4c64
2025-03-18 13:36:53 -04:00
Earlopain
ad478de3f0 [ruby/prism] Optimize array inclusion checks in the parser translator
I see `Array.include?` as 2.4% runtime. Probably because of `LPAREN_CONVERSION_TOKEN_TYPES` but
the others will be faster as well.

Also remove some inline array checks. They are specifically optimized in Ruby since 3.4, but for now prism is for >= 2.7

ca9500a3fc
2025-03-18 13:36:53 -04:00
Earlopain
d5503444fd [ruby/prism] Fix parser translator crash for certain octal escapes
`Integer#chr` performs some validation that we don't want/need. Octal escapes can go above 255, where it will then raise trying to convert.

`append_as_bytes` actually allows to pass a number, so we can just skip that call.
Although, on older rubies of course we still need to handle this in the polyfill.
I don't really like using `pack` but don't know of another way to do so.

For the utf-8 escapes, this is not an issue. Invalid utf-8 in these is simply a syntax error.

161c606b1f
2025-03-18 13:36:53 -04:00
Kevin Newton
1944247a0e [ruby/prism] Handle control and meta escapes in parser translation
09c59a3aa5
2025-03-18 13:36:53 -04:00
Earlopain
fd7a10cf4a [ruby/prism] Further refine string handling in the parser translator
Mostly around newlines and line continuation.
* percent arrays need special backslash handling in the ast
* Fix offset issue for heredocs with many line continuations (used wrong variable as index access)
* More refined rules on when to simplify string tokens
* Handle line continuations in squiggly heredocs
* Correctly dedent squiggly heredocs with interpolation
* Consider `':foo:` and `%s[foo]` to not be interpolation

4edfe9d981
2025-03-18 13:36:53 -04:00
Earlopain
5d138f2b43 [ruby/prism] Better handle regexp in the parser translator
Turns out, it was already almost correct. If you disregard \c and \M style escapes, only a single character is allowed to be escaped in a regex so most tests passed already.

There was also a mistake where the wrong value was constructed for the ast, this is now fixed.
One test fails because of this, but I'm fairly sure it is because of a parser bug. For `/\“/`, the backslash is supposed to be removed because it is a multibyte character. But tbh,
I don't entirely understand all the rules.

Fixes more than half of the remaining ast differences for rubocop tests

e1c75f304b
2025-03-18 13:36:53 -04:00
Earlopain
177adf6fa5 [ruby/prism] Fix parser translator tokens for %-arrays with whitespace escapes
Also fixes a token incompatibility for the word separator. parser only considers whitespace until the first newline

bd3dd2b62a
2025-03-18 13:36:53 -04:00
Earlopain
ac728389e2 [ruby/prism] Fix parser translator edge-case when multiline string ends with \n
When the line contains no real newline but contains unescaped ones, then there will be one less entry

4ef093b600
2025-03-18 13:36:53 -04:00
Earlopain
0fcb7fc21d [ruby/prism] Better handle all kinds of multiline strings in the parser translator
This is a followup to #3373, where the implementation
was extracted

2637007929
2025-03-18 13:36:53 -04:00
Earlopain
acf404e20e [ruby/prism] Fix an incompatibility with the parser translator
The offset cache contains an entry for each byte so it can't be accessed via the string length.

Adds tests for all variants except for this:
```
"fo
o" "ba
’"
```

For some reason, this still has the wrong offset.

a651126458
2025-03-18 13:36:53 -04:00
Earlopain
f49a0114e3 [ruby/prism] Fix parser translator rescue location with semicolon body
There are a few other locations that should be included in that check.
I think the end location must always be present but I left it in to be safe (maybe implicit begin somehow?)

545d07ddc3
2025-03-18 13:36:53 -04:00
Earlopain
a679597547 [ruby/prism] Fix parser translator crash for certain octal escapes
`Integer#chr` performs some validation that we don't want/need. Octal escapes can go above 255, where it will then raise trying to convert.

`append_as_bytes` actually allows to pass a number, so we can just skip that call.
Although, on older rubies of course we still need to handle this in the polyfill.
I don't really like using `pack` but don't know of another way to do so.

For the utf-8 escapes, this is not an issue. Invalid utf-8 in these is simply a syntax error.

161c606b1f
2025-03-18 13:36:53 -04:00
Earlopain
bc506295a3 [ruby/prism] Further refine string handling in the parser translator
Mostly around newlines and line continuation.
* percent arrays need special backslash handling in the ast
* Fix offset issue for heredocs with many line continuations (used wrong variable as index access)
* More refined rules on when to simplify string tokens
* Handle line continuations in squiggly heredocs
* Correctly dedent squiggly heredocs with interpolation
* Consider `':foo:` and `%s[foo]` to not be interpolation

4edfe9d981
2025-03-18 13:36:53 -04:00
Earlopain
9e5e3f1bed [ruby/prism] Add a custom builder class for the parser translator
I want to add new node types to the parser translator, for example `itblock`. The bulk of the work is already done by prism itself. In the `parser`
builder, this would be a 5-line change at most but we don't control that here.

Instead, we can add our own builder and either overwrite the few methods we need,
or just inline the complete builder. I'm not sure yet which would be better.

`rubocop-ast` uses its own builder for `parser`. For this to correctly work, it must explicitly choose to extend the
prism builder and use it, same as it currently chooses to use a different parser when prism is used.

I'd like to enforce that the builder for prism extends its custom one since it will lead to
some pretty weird issues otherwise. But first, I'd like to change `rubocop-ast` to make use of this.

b080e608a8
2025-03-18 13:36:53 -04:00
Earlopain
705bd6fadb [ruby/prism] Fix parser translator when unescaping invalid utf8
1. The string starts out as binary
2. `ち` is appended, forcing it back into utf-8
3. Some invalid byte sequences are tried to append

> incompatible character encodings: UTF-8 and BINARY (ASCII-8BIT)

This makes use of my wish to use `append_as_bytes`. Unfortunatly that method is rather new
so it needs a fallback

e31e94a775
2025-03-18 13:36:53 -04:00
Kevin Newton
f2483c79fe [ruby/prism] Use Set.new over to_set
422d5c4c64
2025-03-13 14:24:48 +00:00
Earlopain
3d4c7c3802 [ruby/prism] Use reverse_each in the parser translator
Avoids an array allocation which matters more and more
the larger the file is.

I have it at 14% of runtime.

f65b90f27d
2025-03-13 13:52:45 +00:00
Earlopain
67e6ccb23f [ruby/prism] Optimize array inclusion checks in the parser translator
I see `Array.include?` as 2.4% runtime. Probably because of `LPAREN_CONVERSION_TOKEN_TYPES` but
the others will be faster as well.

Also remove some inline array checks. They are specifically optimized in Ruby since 3.4, but for now prism is for >= 2.7

ca9500a3fc
2025-03-13 13:52:45 +00:00
Earlopain
4b844f7d9e [ruby/prism] Ensure backwards compatibility with the custom parser builder
Temoprary backwards-compat code so that current users
don't break.

Eventually the Translation::Parser initializer should asser that the correct class is passed in.

66b0162b35
2025-03-13 12:06:58 +00:00
Kevin Newton
af76b7f4d9 [ruby/prism] Revert "Mark extension as Ractor-safe"
56eaf53732
2025-03-12 19:56:22 +00:00
Kevin Newton
242e99eb0f [ruby/prism] Mark extension as Ractor-safe
10e5431b38
2025-03-12 19:15:03 +00:00
Koichi ITO
6b4453e332 [ruby/prism] Support itblock for Prism::Translation::Parser
## Summary

`itblock` node is added to support the `it` block parameter syntax introduced in Ruby 3.4.

```console
$ ruby -Ilib -rprism -rprism/translation/parser34 -e 'buffer = Parser::Source::Buffer.new("path"); buffer.source = "proc { it }"; \
                                                      p Prism::Translation::Parser34.new.tokenize(buffer)[0]'
s(:itblock,
  s(:send, nil, :proc), :it,
  s(:lvar, :it))
```

This node design is similar to the `numblock` node, which was introduced for the numbered parameter syntax in Ruby 2.7.

```
$ ruby -Ilib -rprism -rprism/translation/parser34 -e 'buffer = Parser::Source::Buffer.new("path"); buffer.source = "proc { _1 }"; \
                                                      p Prism::Translation::Parser34.new.tokenize(buffer)[0]'
s(:numblock,
  s(:send, nil, :proc), 1,
  s(:lvar, :_1))
```

The difference is that while numbered parameters can have multiple parameters, the `it` block parameter syntax allows only a single parameter.

In Ruby 3.3, the conventional node prior to the `it` block parameter syntax is returned.

```console
$ ruby -Ilib -rprism -rprism/translation/parser33 -e 'buffer = Parser::Source::Buffer.new("path"); buffer.source = "proc { it }"; \
                                                      p Prism::Translation::Parser33.new.tokenize(buffer)[0]'
s(:block,
  s(:send, nil, :proc),
  s(:args),
  s(:send, nil, :it))
```

## Development Note

The Parser gem does not yet support the `it` block parameter syntax. This is the first case where Prism's node design precedes that of the Parser gem.
When implementing https://github.com/whitequark/parser/issues/962, this node design will need to be taken into consideration.

c141e1420a
2025-03-10 16:57:46 +00:00