Commit graph

118 commits

Author SHA1 Message Date
Max Semenik
2b5de6f839
Remove proto comments from C files
Closes GH-5758
2020-07-06 21:13:34 +02:00
twosee
83a77015ad Add helper APIs for maybe-interned string creation
Add ZVAL_CHAR/RETVAL_CHAR/RETURN_CHAR as a shortcut for using
ZVAL_INTERNED_STRING and ZSTR_CHAR.

Add zend_string_init_fast() as a helper for the empty string /
one char interned string / zend_string_init() pattern.

Also add corresponding ZVAL_STRINGL_FAST etc macros.

Closes GH-5684.
2020-06-08 15:31:52 +02:00
Nikita Popov
b03cafd19c Fix bug #77966: Cannot alias a method named "namespace"
This is a bit tricky: In this cases we have "namespace as", which
means that we will only recognize "namespace" as an identifier when
the lookahead token is already at the "as". This means that
zend_lex_tstring picks up the wrong identifier.

We solve this by actually assigning the identifier as the semantic
value on the parser stack -- as in almost all cases we will not
actually need the identifier, this is just an (offset, size)
reference, not a copy of the string.

Additionally, we need to teach the lexer feedback mechanism used
by tokenizer TOKEN_PARSE mode to apply feedback to something
other than the very last token. To that purpose we pass through
the token text and check the tokens in reverse order to find the
right one.

Closes GH-5668.
2020-06-08 12:55:14 +02:00
Máté Kocsis
1f48feebb9
Improve some TypeError and ValueError messages
Closes GH-5377
2020-04-14 14:38:45 +02:00
Máté Kocsis
c5fb4f0794
Generate function entries from stubs for a couple of extensions
Migrates ext/standard, ext/tidy, ext/tokenizer,
ext/xml, ext/xml_reader, and ext/xml_writer. Closes GH-5381.
2020-04-14 11:49:02 +02:00
Alex Dowad
80598f1250 Syntax errors caused by unclosed {, [, ( mention specific location
Aside from a few very specific syntax errors for which detailed exceptions are
thrown, generally PHP just emits the default error messages generated by bison on syntax
error. These messages are very uninformative; they just say "Unexpected ... at line ...".

This is most problematic with constructs which can span an arbitrary number of lines, such
as blocks of code delimited by { }, 'if' conditions delimited by ( ), and so on. If a closing
delimiter is missed, the block will run for the entire remainder of the source file (which
could be thousands of lines), and then at the end, a parse error will be thrown with the
dreaded words: "Unexpected end of file".

Therefore, track the positions of opening and closing delimiters and ensure that they match
up correctly. If any mismatch or missing delimiter is detected, immediately throw a parse
error which points the user to the offending line. This is best done in the *lexer* and not
in the parser.

Thanks to Nikita Popov and George Peter Banyard for suggesting improvements.

Fixes bug #79368.
Closes GH-5364.
2020-04-14 11:22:23 +02:00
Nikita Popov
5a09b9fb0f Add PhpToken class
RFC: https://wiki.php.net/rfc/token_as_object

Relative to the RFC, this also adds a __toString() method,
as discussed on list.

Closes GH-5176.
2020-03-26 11:09:18 +01:00
Nikita Popov
8a4068988b Clarify that token_get_all() never returns false
It can only fail in TOKEN_PARSE mode, in which case it will throw.
2020-02-14 16:50:12 +01:00
Nikita Popov
1593f79ecd Merge branch 'PHP-7.4' 2019-09-28 21:30:15 +02:00
Tyson Andre
7e0f7b7677 Reduce memory used by token_get_all()
Around a quarter of all strings in array tokens would have a string that's one
character long (e.g. ` `, `\`, `1`)

For parsing a large number of php files,
The memory increase dropped from 378374248 to 369535688 (2.5%)

Closes GH-4753.
2019-09-28 21:29:58 +02:00
Gabriel Caruso
5d6e923d46
Remove mention of PHP major version in Copyright headers
Closes GH-4732.
2019-09-25 14:51:43 +02:00
Stephen Reay
cfbcd4caef Arginfo stubs for tokenizer 2019-08-11 19:03:04 +02:00
Nikita Popov
7f72d771e8 Revert "Switch to bison location tracking"
This reverts commit e528762c1c.

Dmitry reports that this has a non-trivial impact on parsing
overhead, especially on 32-bit systems. As we don't have a strong
need for this change right now, I'm reverting it.

See also comments on
e528762c1c.
2019-03-28 09:29:08 +01:00
Nikita Popov
e528762c1c Switch to bison location tracking
Locations for AST nodes are now tracked with the help of bison
location tracking. This is more accurate than what we currently do
and easier to extend with more information.

A zend_ast_loc structure is introduced, which is used for the location
stack. Currently it only holds the start lineno, but can be extended
to also hold end lineno and offset/column information in the future.

All AST constructors now accept a zend_ast_loc* as first argument, and
will use it to determine their lineno. Previously this used either the
CG(zend_lineno), or the smallest AST lineno of child nodes.

On the parser side, the location structure for a whole rule can be
obtained using the &@$ character salad.
2019-03-21 16:27:48 +01:00
Peter Kokot
92ac598aab Remove local variables
This patch removes the so called local variables defined per
file basis for certain editors to properly show tab width, and
similar settings. These are mainly used by Vim and Emacs editors
yet with recent changes the once working definitions don't work
anymore in Vim without custom plugins or additional configuration.
Neither are these settings synced across the PHP code base.

A simpler and better approach is EditorConfig and fixing code
using some code style fixing tools in the future instead.

This patch also removes the so called modelines for Vim. Modelines
allow Vim editor specifically to set some editor configuration such as
syntax highlighting, indentation style and tab width to be set in the
first line or the last 5 lines per file basis. Since the php test
files have syntax highlighting already set in most editors properly and
EditorConfig takes care of the indentation settings, this patch removes
these as well for the Vim 6.0 and newer versions.

With the removal of local variables for certain editors such as
Emacs and Vim, the footer is also probably not needed anymore when
creating extensions using ext_skel.php script.

Additionally, Vim modelines for setting php syntax and some editor
settings has been removed from some *.phpt files.  All these are
mostly not relevant for phpt files neither work properly in the
middle of the file.
2019-02-03 21:03:00 +01:00
Zeev Suraski
0cf7de1c70 Remove yearly range from copyright notice 2019-01-30 11:03:12 +02:00
Peter Kokot
8d3f8ca12a Remove unused Git attributes ident
The $Id$ keywords were used in Subversion where they can be substituted
with filename, last revision number change, last changed date, and last
user who changed it.

In Git this functionality is different and can be done with Git attribute
ident. These need to be defined manually for each file in the
.gitattributes file and are afterwards replaced with 40-character
hexadecimal blob object name which is based only on the particular file
contents.

This patch simplifies handling of $Id$ keywords by removing them since
they are not used anymore.
2018-07-25 00:53:25 +02:00
Dmitry Stogov
4a475a4976 Replace legacy zval_dtor() by zval_ptr_dtor_nogc() or even more specialized destructors.
zval_dtor() doesn't make a lot of sense in PHP-7.* and it's used incorrectly in some places.
Its occurances should be replaced by zval_ptr_dtor() or zval_ptr_dtor_nogc(), or even more specialized destructors.
2018-07-04 19:22:24 +03:00
Nikita Popov
ce7fc2e308 Fix typos... 2018-06-27 23:22:29 +02:00
Nikita Popov
9b02ee0bba Fixed bug #76538 2018-06-27 23:06:25 +02:00
Xinchen Hui
83a77f5a28 Fixed typo 2018-06-18 12:14:17 +08:00
Xinchen Hui
4d69bbeee7 Fixed bug #76437 (token_get_all with TOKEN_PARSE flag fails to recognise close tag) 2018-06-18 11:33:48 +08:00
Dmitry Stogov
8afb91cdad PHP scanner optimization 2018-03-14 01:48:17 +03:00
Xinchen Hui
a6519d0514 year++ 2018-01-02 12:57:58 +08:00
Dmitry Stogov
9e709e2fa0 Move constants into read-only data segment 2017-12-14 18:43:44 +03:00
Nikita Popov
0d834c6922 Use ZSTR_CHAR in token_get_all() 2017-03-22 22:38:07 +01:00
Nikita Popov
fec708f518 Simplify increment_lineno handling 2017-03-22 22:33:05 +01:00
Sammy Kaye Powers
9e29f841ce Update copyright headers to 2017 2017-01-02 09:30:12 -06:00
Sara Golemon
8bbfe174a8 Use new param API in tokenizer 2016-12-31 09:01:20 -08:00
Nikita Popov
07af6ba898 Make sure TOKEN_PARSE mode is thread safe
Introduce an on_event_context passed to the on_event hook. Use this
context to pass along the token array. Previously this was stored
in a non-tls global :/
2016-07-23 00:00:13 +02:00
Lior Kaplan
ed35de784f Merge branch 'PHP-5.6' into PHP-7.0
* PHP-5.6:
  Happy new year (Update copyright to 2016)
2016-01-01 19:48:25 +02:00
Lior Kaplan
49493a2dcf Happy new year (Update copyright to 2016) 2016-01-01 19:21:47 +02:00
Nikita Popov
a49ce7bb91 Don't return T_ERROR from token_get_all()
This turned out to be rather inconvenient after all. Instead just
return the same output we did on PHP 5. If people want to have an
error, use TOKEN_PARSE.
2015-07-09 23:02:21 +02:00
Nikita Popov
d91aad5966 Fix bug #69430
Don't throw from token_get_all() unless TOKEN_PARSE is used. Errors
are reported as T_ERROR tokens.
2015-07-09 19:11:48 +02:00
Nikita Popov
8abc3022b0 Update token_get_all() arginfo 2015-07-09 18:19:12 +02:00
Dmitry Stogov
8e10e8f921 Avoid zval duplication in ZVAL_ZVAL() macro (it was necessary only in few places).
Switch from ZVAL_ZVAL() to simpler macros where possible (it makes sense to review remaining places)
2015-06-12 12:33:23 +03:00
Márcio Almada
110759386e ext tokenizer port + cleanup unused lexer states
we basically added a mechanism to store the token stream during parsing
and exposed the entire parser stack on the tokenizer extension through
an opt in flag: token_get_all($src, TOKEN_PARSE).

this change allows easy future language enhancements regarding context
aware parsing & scanning without further maintance on the tokenizer
extension while solves known inconsistencies "parseless" tokenizer
extension has when it handles `__halt_compiler()` presence.
2015-04-30 03:03:29 -03:00
Márcio Almada
02a9eb4f8c fix indentation + remove c++ comments 2015-04-30 03:03:29 -03:00
Nikita Popov
a8bf1c5d8f Throw ParseException from lexer
Primarily to avoid getting fatal errors from token_get_all().

Implemented using a magic E_ERROR token, which the lexer emits to
force a parser failure.
2015-04-02 16:31:17 +02:00
Anatol Belski
663074b6b1 cleanup mod version macros and mod defs, round x 2015-03-23 21:30:22 +01:00
Xinchen Hui
fc33f52d8c bump year 2015-01-15 23:27:30 +08:00
Xinchen Hui
0579e8278d bump year 2015-01-15 23:26:37 +08:00
Stanislav Malyshev
b7a7b1a624 trailing whitespace removal 2015-01-10 15:07:38 -08:00
Anatol Belski
bdeb220f48 first shot remove TSRMLS_* things 2014-12-13 23:06:14 +01:00
Johannes Schlüter
d0cb715373 s/PHP 5/PHP 7/ 2014-09-19 18:33:14 +02:00
Dmitry Stogov
bccc653185 Avoid double IS_INTERNED() check 2014-09-19 17:32:50 +04:00
Anatol Belski
c3e3c98ec6 master renames phase 1 2014-08-25 19:24:55 +02:00
Anatol Belski
58aa0f8df8 fixes to tokenizer 2014-08-19 13:05:20 +02:00
Anatol Belski
63d3f0b844 basic macro replacements, all at once 2014-08-19 08:07:31 +02:00
Dmitry Stogov
050d7e38ad Cleanup (1-st round) 2014-04-15 15:40:40 +04:00