Around a quarter of all strings in array tokens would have a string that's one
character long (e.g. ` `, `\`, `1`)
For parsing a large number of php files,
The memory increase dropped from 378374248 to 369535688 (2.5%)
Closes GH-4753.
This reverts commit e528762c1c.
Dmitry reports that this has a non-trivial impact on parsing
overhead, especially on 32-bit systems. As we don't have a strong
need for this change right now, I'm reverting it.
See also comments on
e528762c1c.
Locations for AST nodes are now tracked with the help of bison
location tracking. This is more accurate than what we currently do
and easier to extend with more information.
A zend_ast_loc structure is introduced, which is used for the location
stack. Currently it only holds the start lineno, but can be extended
to also hold end lineno and offset/column information in the future.
All AST constructors now accept a zend_ast_loc* as first argument, and
will use it to determine their lineno. Previously this used either the
CG(zend_lineno), or the smallest AST lineno of child nodes.
On the parser side, the location structure for a whole rule can be
obtained using the &@$ character salad.
This patch removes the so called local variables defined per
file basis for certain editors to properly show tab width, and
similar settings. These are mainly used by Vim and Emacs editors
yet with recent changes the once working definitions don't work
anymore in Vim without custom plugins or additional configuration.
Neither are these settings synced across the PHP code base.
A simpler and better approach is EditorConfig and fixing code
using some code style fixing tools in the future instead.
This patch also removes the so called modelines for Vim. Modelines
allow Vim editor specifically to set some editor configuration such as
syntax highlighting, indentation style and tab width to be set in the
first line or the last 5 lines per file basis. Since the php test
files have syntax highlighting already set in most editors properly and
EditorConfig takes care of the indentation settings, this patch removes
these as well for the Vim 6.0 and newer versions.
With the removal of local variables for certain editors such as
Emacs and Vim, the footer is also probably not needed anymore when
creating extensions using ext_skel.php script.
Additionally, Vim modelines for setting php syntax and some editor
settings has been removed from some *.phpt files. All these are
mostly not relevant for phpt files neither work properly in the
middle of the file.
The $Id$ keywords were used in Subversion where they can be substituted
with filename, last revision number change, last changed date, and last
user who changed it.
In Git this functionality is different and can be done with Git attribute
ident. These need to be defined manually for each file in the
.gitattributes file and are afterwards replaced with 40-character
hexadecimal blob object name which is based only on the particular file
contents.
This patch simplifies handling of $Id$ keywords by removing them since
they are not used anymore.
zval_dtor() doesn't make a lot of sense in PHP-7.* and it's used incorrectly in some places.
Its occurances should be replaced by zval_ptr_dtor() or zval_ptr_dtor_nogc(), or even more specialized destructors.
Introduce an on_event_context passed to the on_event hook. Use this
context to pass along the token array. Previously this was stored
in a non-tls global :/
This turned out to be rather inconvenient after all. Instead just
return the same output we did on PHP 5. If people want to have an
error, use TOKEN_PARSE.
we basically added a mechanism to store the token stream during parsing
and exposed the entire parser stack on the tokenizer extension through
an opt in flag: token_get_all($src, TOKEN_PARSE).
this change allows easy future language enhancements regarding context
aware parsing & scanning without further maintance on the tokenizer
extension while solves known inconsistencies "parseless" tokenizer
extension has when it handles `__halt_compiler()` presence.
Primarily to avoid getting fatal errors from token_get_all().
Implemented using a magic E_ERROR token, which the lexer emits to
force a parser failure.
This fixes bug #60097.
Before two global variables CG(heredoc) and CG(heredoc_len) were used to
track the current heredoc label. In order to support nested heredoc
strings the *previous* heredoc label was assigned as the token value of
T_START_HEREDOC and the language_parser.y assigned that to CG(heredoc).
This created a dependency of the lexer on the parser. Thus the
token_get_all() function, which accesses the lexer directly without
also running the parser, was not able to tokenize nested heredoc strings
(and leaked memory). Same applies for the source-code highlighting
functions.
The new approach is to maintain a heredoc_label_stack in the lexer, which
contains all active heredoc labels.
As it is no longer required, T_START_HEREDOC and T_END_HEREDOC now don't
carry a token value anymore.
In order to make the work with zend_ptr_stack in this context more
convenient I added a new function zend_ptr_stack_top(), which retrieves the
top element of the stack (similar to zend_stack_top()).