This is a tradeoff that I think is worth it. Right now we have a
location list that tracks the location of each of the block locals.
Instead, I'd like to make that a node list that has a proper node
in each spot in the list. In doing so, we eliminate the need to have
a location list at all, making it simpler on all of the various
consumers as we have one fewer field type. There should be minimal
memory implications here since this syntax is exceedingly rare.
04d329ddf0
Essentially, this change updates `yp_unescape_calculate_difference` to
not create syntax errors, and we rely entirely on
`yp_unescape_manipulate_string` to report syntax errors.
To do that, this PR adds another (!) parameter to `unescape`:
`yp_list_t *error_list`. When present, `unescape` reports syntax
errors (and otherwise does not).
However, an edge case that needed to be addressed is reporting syntax
errors in this case:
?\u{1234 2345}
In a string context, it's possible to have multiple codepoints by
doing something like `"\u{1234 2345}"`; however, in the character
literal context, this is a syntax error -- only a single codepoint is
allowed.
Unfortunately, when `yp_unescape_manipulate_string` is called, there's
nothing to indicate that we are in a "character literal" context and
that only a single codepoint is valid.
To make this work, this PR:
- introduces a new static utility function in yarp.c,
`yp_char_literal_node_create_and_unescape`, which is called when
we're parsing `YP_TOKEN_CHARACTER_LITERAL`
- introduces a new (unexported) function,
`yp_unescape_manipulate_char_literal` which does the same thing as
`yp_unescape_manipulate_string` but tells `unescape` that only a
single codepoint is expected
f6a65840b5
file
(https://github.com/ruby/yarp/pull/1371)
* refactor: move EOF check into yp_unescape_calculate_difference
parser_lex is a bit more readable when we can rely on that behavior
* fix: octal and hex digits at the end of a file
Previously this resulted in invalid memory access.
* fix: unicode strings at the end of a file
Previously this resulted in invalid memory access.
* Unterminated curly-bracket unicode is a syntax error
21cf11acb5
Also, a similar test and fix for interpolated regular expressions.
This snippet:
<<-A.g//,
A
/{/, ''\
previously created a regular expression node with inverted start and
end:
RegularExpressionNode(14...13)((14...15), (15...21), (12...13), ", ''", 0),
which failed an assertion during serialization.
After this change:
RegularExpressionNode(12...15)((14...15), (15...21), (12...13), ", ''", 0),
Found by the fuzzer.
5fef572f95
The snippet added in this commit previously resulted in a CallNode
with inverted start and end locations:
> AssocNode(15...13)(
> CallNode(15...13)(
StringNode(15...17)((15...16), (16...16), (16...17), ""),
nil,
(12...13),
nil,
ArgumentsNode(12...13)([MissingNode(12...13)()]),
nil,
nil,
0,
"/"
),
MissingNode(13...13)(),
(13...13)
),
which failed an assertion during serialization.
After this change, it looks better:
> AssocNode(12...13)(
> CallNode(12...17)(
StringNode(15...17)((15...16), (16...16), (16...17), ""),
nil,
(12...13),
nil,
ArgumentsNode(12...13)([MissingNode(12...13)()]),
nil,
nil,
0,
"/"
),
MissingNode(13...13)(),
(13...13)
),
Found by the fuzzer.
040aa63ad6
The presence of the heredocs in this snippet with invalid syntax:
for <<A + <<B
A
B
causes the MissingNode to have a location after other nodes in the
list, resulting in a StatementsNode with inverted start and end
locations:
[ForNode(0...14)(
MultiWriteNode(4...7)([InterpolatedStringNode(4...7)((4...7), [], (14...16))], nil, nil, nil, nil),
MissingNode(16...16)(),
> StatementsNode(16...14)(
[MissingNode(16...16)(), InterpolatedStringNode(10...13)((10...13), [], (16...18)), MissingNode(13...14)()]
),
(0...3),
(16...16),
nil,
(14...14)
)]
which failed an assertion during serialization.
With this fix, the node's locations are:
[ForNode(0...14)(
MultiWriteNode(4...7)([InterpolatedStringNode(4...7)((4...7), [], (14...16))], nil, nil, nil, nil),
MissingNode(16...16)(),
> StatementsNode(10...16)(
[MissingNode(16...16)(), InterpolatedStringNode(10...13)((10...13), [], (16...18)), MissingNode(13...14)()]
),
(0...3),
(16...16),
nil,
(14...14)
)]
Found by the fuzzer.
09bcedc05e
Previously this resulted in invalid memory access as well as a
cascading failed assertion:
src/enc/yp_unicode.c:2224: yp_utf_8_codepoint: Assertion `n >= 1' failed.
Found by the fuzzer.
a34c534440
Two fixes were necessary:
- ensure we are handling newlines correctly
- accept two consecutive string tokens without a separator
4e707937cb
Co-authored-by: Kevin Newton <kddnewton@gmail.com>