The compile context is shared between patterns, so we need to set
the character tables unconditionally in case we switched from
a non-C locale to the C locale.
We're starting to see a mix between uses of zend_bool and bool.
Replace all usages with the standard bool type everywhere.
Of course, zend_bool is retained as an alias.
Separation can only possibly make sense for array parameters
(or something that can contain arrays, like zval parameters). It
never makes sense to separate a bool.
The deref parameters are also of dubious utility, but leaving them
for now.
Create a separate general context that uses ZMM as allocator and
use it to allocate temporary PCRE match data (there is still one
global match data). There is no requirement that the match data
and the compiled regex / match context use the same general context.
This makes sure that we do not leak persistent memory on bailout
and fixes oss-fuzz #25296, on which half the libfuzzer runs
currently get stuck.
From an engine perspective, named parameters mainly add three
concepts:
* The SEND_* opcodes now accept a CONST op2, which is the
argument name. For now, it is looked up by linear scan and
runtime cached.
* This may leave UNDEF arguments on the stack. To avoid having
to deal with them in other places, a CHECK_UNDEF_ARGS opcode
is used to either replace them with defaults, or error.
* For variadic functions, EX(extra_named_params) are collected
and need to be freed based on ZEND_CALL_HAS_EXTRA_NAMED_PARAMS.
RFC: https://wiki.php.net/rfc/named_params
Closes GH-5357.
Add ZVAL_CHAR/RETVAL_CHAR/RETURN_CHAR as a shortcut for using
ZVAL_INTERNED_STRING and ZSTR_CHAR.
Add zend_string_init_fast() as a helper for the empty string /
one char interned string / zend_string_init() pattern.
Also add corresponding ZVAL_STRINGL_FAST etc macros.
Closes GH-5684.
We already document that this is the case, but currently it's only
true if setlocale() has not been called. Make sure ctype_string is
always NULL, even with an explicit "C" locale call, so we can
more efficiently check whether we are in the "C" locale.
Closes GH-5542.
Provides the last PCRE error as a human-readable message, similar
to functionality existing in other extensions, such as
json_last_error_msg().
Closes GH-5185.
PCRE only validates the string starting from the start offset
(minus maximum look-behind, but let's ignore that), so we can
only remember that the string is fully valid UTF-8 is the original
start offset is zero.
We need not just the whole string to be UTF-8, but the start
position to be on a character boundary as well. Check this by
looking for a continuation byte.
A new function `pcre_get_compiled_regex_cache_ex()` is introduced,
which allows to compile regexp pattern using the "C" locale instead
of a current locale.
This will be needed to replace setlocale() usage in fileinfo,
which is not thread-safe.
Print a more informative message that indicates that this is
likely a permission issue, and also indicate that pcre.jit=0
can be used to work around it.
Also automatically disable the JIT, so that this message is
only shown once.
See bug #78630.