Commit graph

6 commits

Author SHA1 Message Date
Rowan Tommins
55a15f32ce Improve output of tokens in Parse Errors
Currently, unexpected tokens in the parser are shown as the text
found, plus the internal token name, including the notorious
"unexpected '::' (T_PAAMAYIM_NEKUDOTAYIM)".

This commit replaces that with a more user-friendly format, with
two main types of token:

* Tokens which always represent the same text are shown like
  'unexpected token "::"' and 'expected "::"'
* Tokens which have variable text are given a user-friendly
  name, and show like 'unexpected identifier "foo"', and
  'expected identifer'.

A few tokens have special cases:

* unexpected token """ -> unexpected double-quote mark
* unexpected quoted string "'foo'" -> unexpected single-quoted
  string "foo"
* unexpected quoted string ""foo"" -> unexpected double-quoted
  string "foo"
* unexpected illegal character "_" -> unexpected character 0xNN
  (where _ is almost certainly a control character, and NN is the
   hexadecimal value of the byte)

The \ token has a special case in the implementation just to stop
bison making a mess of escaping it and it coming out as \\
2020-07-13 11:07:40 +02:00
Alex Dowad
80598f1250 Syntax errors caused by unclosed {, [, ( mention specific location
Aside from a few very specific syntax errors for which detailed exceptions are
thrown, generally PHP just emits the default error messages generated by bison on syntax
error. These messages are very uninformative; they just say "Unexpected ... at line ...".

This is most problematic with constructs which can span an arbitrary number of lines, such
as blocks of code delimited by { }, 'if' conditions delimited by ( ), and so on. If a closing
delimiter is missed, the block will run for the entire remainder of the source file (which
could be thousands of lines), and then at the end, a parse error will be thrown with the
dreaded words: "Unexpected end of file".

Therefore, track the positions of opening and closing delimiters and ensure that they match
up correctly. If any mismatch or missing delimiter is detected, immediately throw a parse
error which points the user to the offending line. This is best done in the *lexer* and not
in the parser.

Thanks to Nikita Popov and George Peter Banyard for suggesting improvements.

Fixes bug #79368.
Closes GH-5364.
2020-04-14 11:22:23 +02:00
Nikita Popov
b3f74b0b7d Deprecate allow_url_include 2019-07-22 11:39:52 +02:00
Aaron Piotrowski
64b167d201 Updated tests to reflect exception class changes. 2015-05-16 16:49:14 -05:00
Nikita Popov
a8bf1c5d8f Throw ParseException from lexer
Primarily to avoid getting fatal errors from token_get_all().

Implemented using a magic E_ERROR token, which the lexer emits to
force a parser failure.
2015-04-02 16:31:17 +02:00
Nikita Popov
f1a6c06f14 Support ParseException for require etc 2015-03-17 21:35:42 +01:00