Commit graph

567 commits

Author SHA1 Message Date
Remi Collet
56495ac031
fix for pcre2 10.38 2021-10-21 13:33:35 +02:00
Christoph M. Becker
a6b43086e6
Fix #81243: Too much memory is allocated for preg_replace()
Trimming a potentially over-allocated string appears to be reasonable,
so we drop the condition altogether.

We also re-allocate twice the size needed in the first place, and not
roughly tripple the size.

Closes GH-7231.
2021-07-12 18:33:55 +02:00
Anatol Belski
1a1d86d562 pcre: Workaround bug #81101
The way to fix it is to disable certain match start optimizaions. The
observed performance impact appears negligible ATM, compared to the
functional regression revealed.

A possible side effect might occur if a pattern uses (*COMMIT) or
(*MARK), which is however not a very broadly used syntax in PHP. Still
this should be observed and handled by possibly adding a possibility to
reverse PCRE2_NO_START_OPTIMIZE on the user side.

One test shows a behavior change, where instead of int 0 the match
would produce an error and return false. Except strict comparison
is used, this should be acceptable.

Signed-off-by: Anatol Belski <ab@php.net>
(cherry picked from commit d188ca7688)
Signed-off-by: Anatol Belski <ab@php.net>
2021-06-19 15:23:43 +02:00
Nikita Popov
4be867e910 Fix locale switch back to C in pcre
The compile context is shared between patterns, so we need to set
the character tables unconditionally in case we switched from
a non-C locale to the C locale.
2021-03-18 10:48:43 +01:00
Dharman
282355efd5 Fix bug #80866
Closes GH-6774.
2021-03-15 14:47:45 +01:00
Nikita Popov
3a51530963 Fixed bug #79257
Replace an existing entry for a given name only if we have a match.
2020-02-11 17:31:48 +01:00
Nikita Popov
cd5591a28d PCRE: Only remember valid UTF-8 if start offset zero
PCRE only validates the string starting from the start offset
(minus maximum look-behind, but let's ignore that), so we can
only remember that the string is fully valid UTF-8 is the original
start offset is zero.
2020-02-07 17:01:39 +01:00
Nikita Popov
c9e78e6d33 PCRE: Check whether start offset is on char boundary
We need not just the whole string to be UTF-8, but the start
position to be on a character boundary as well. Check this by
looking for a continuation byte.
2020-02-07 16:49:28 +01:00
Nikita Popov
e30f52b919 Merge branch 'PHP-7.3' into PHP-7.4
* PHP-7.3:
  Fixed bug #79188
2020-02-05 11:21:25 +01:00
Nikita Popov
13bfa9f5ac Fixed bug #79188 2020-02-05 11:18:46 +01:00
Christoph M. Becker
cfb643ca2b Merge branch 'PHP-7.3' into PHP-7.4
* PHP-7.3:
  Fix #78853: preg_match() may return integer > 1
2019-11-22 19:29:11 +01:00
Christoph M. Becker
e1da72bdf1 Fix #78853: preg_match() may return integer > 1
Commit 54ebebd[1] optimized the match loop, but for this case it has
been overlooked, that we must only loop if we're doing global matching.

[1] <http://git.php.net/?p=php-src.git;a=commit;h=54ebebd686255c5f124af718c966edb392782d4a>
2019-11-22 19:26:26 +01:00
Nikita Popov
e19f0e86dc Merge branch 'PHP-7.3' into PHP-7.4
* PHP-7.3:
  Fix php_pcre_mutex_free()
2019-11-07 14:31:55 +01:00
Nikita Popov
6dcc0b859f Fix php_pcre_mutex_free()
We should only set the mutex to NULL if we actually freed it.
Due to missing braces non-main threads may currently set it to
NULL first.
2019-11-07 14:31:19 +01:00
Nikita Popov
68b26ff8cf Merge branch 'PHP-7.3' into PHP-7.4 2019-10-08 16:14:06 +02:00
Nikita Popov
736af5f660 Merge branch 'PHP-7.2' into PHP-7.3 2019-10-08 16:13:17 +02:00
Sergei Turchanov
a8f60ac9dd Add pcre_get_compiled_regex_cache_ex() with local_aware flag
A new function `pcre_get_compiled_regex_cache_ex()` is introduced,
which allows to compile regexp pattern using the "C" locale instead
of a current locale.

This will be needed to replace setlocale() usage in fileinfo,
which is not thread-safe.
2019-10-08 16:11:55 +02:00
Nikita Popov
01b3cc4dee Merge branch 'PHP-7.3' into PHP-7.4 2019-10-04 16:04:34 +02:00
Nikita Popov
1d6e9da743 Improve diagnostic on PCRE JIT mmap failure
Print a more informative message that indicates that this is
likely a permission issue, and also indicate that pcre.jit=0
can be used to work around it.

Also automatically disable the JIT, so that this message is
only shown once.

See bug #78630.
2019-10-04 16:03:38 +02:00
Nikita Popov
201729840c Mark PCRE locale key as local persistent 2019-08-13 14:49:59 +02:00
Dmitry Stogov
e3d35b6434 Split destructor 2019-07-04 13:07:47 +03:00
Nikita Popov
ad1b62fca7 Merge branch 'PHP-7.3' into PHP-7.4 2019-06-17 13:31:04 +02:00
Nikita Popov
11b354dd54 Merge branch 'PHP-7.2' into PHP-7.3 2019-06-17 13:30:56 +02:00
Nikita Popov
03db04c3ab Accept null for preg_quote delimiter argument
Related to bug #78163.
2019-06-17 13:30:15 +02:00
Nikita Popov
51fb8dc422 Add specialized pair construction API
Closes GH-3990.
2019-06-11 12:29:55 +02:00
Nikita Popov
a31f46421d Allow exceptions in __toString()
RFC: https://wiki.php.net/rfc/tostring_exceptions

And convert some object to string conversion related recoverable
fatal errors into Error exceptions.

Improve exception safety of internal code performing string
conversions.
2019-06-05 14:25:07 +02:00
Dmitry Stogov
e188e4170f Use ZEND_TRY_ASSIGN_REF_... macros for arguments passed to internal function by reference 2019-04-24 18:28:29 +03:00
Peter Kokot
e06836a1a3 Remove checks for locale.h, setlocale, localeconv
The `<loccale.h>` header file, setlocale, and localeconv are part of the
standard C89 [1] and on current systems can be used unconditionally.

Since PHP 7.4 requires at least C89 or greater, the `HAVE_LOCALE_H`,
`HAVE_SETLOCALE`, and `HAVE_LOCALECONV` symbols defined by Autoconf in
configure.ac [2] can be ommitted and simplifed.

The bundled libmagic (file) has also been patched already in version
5.35 and up in upstream location so when it will be patched also in
php-src the check for locale.h header is still left in the configure.ac
and in windows headers definition file.

[1] https://port70.net/~nsz/c/c89/c89-draft.html#4.4
[2] https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/headers.m4

Omit the bundled libmagic files
2019-04-07 18:32:54 +02:00
Christoph M. Becker
2733420f82 Merge branch 'PHP-7.3' into PHP-7.4
* PHP-7.3:
  Fix #77827: preg_match does not ignore \r in regex flags
2019-03-31 13:35:50 +02:00
Christoph M. Becker
d8b7728b0e Merge branch 'PHP-7.2' into PHP-7.3
* PHP-7.2:
  Fix #77827: preg_match does not ignore \r in regex flags
2019-03-31 13:33:21 +02:00
Christoph M. Becker
88460c017a Fix #77827: preg_match does not ignore \r in regex flags 2019-03-31 13:31:54 +02:00
Nikita Popov
a9b01b60d8 Make PCRE cache per-request on CLI
There will only be one request on the CLI SAPI, so there is no
advantage to having a persistent PCRE cache. Using a non-persistent
cache allows us to use arbitrary strings as cache keys.
2019-03-26 10:10:41 +01:00
Nikita Popov
e7e2056d1a Remove HAVE_PCRE/HAVE_BUNDLED_PCRE checks
PCRE is always available.
2019-03-22 10:29:18 +01:00
Nikita Popov
1cf84f1579 Try to create interned strings in preg_split as well
And convert last_match to last_match_offset, which is more
convenient now.
2019-03-21 10:19:48 +01:00
Nikita Popov
621b1f0312 Cleanup add_offset_pair API
Accept the two offsets directly, rather than doing length calculations
at all callsites. Also extract the logic to create a possibly interned
string.

Switch the split implementation to work on a char* subject internally,
because ZSTR_VAL(subject_str) is a mouthful...
2019-03-21 10:08:29 +01:00
Nikita Popov
6311581ac6 Fix bug #73948
If PREG_UNMATCHED_AS_NULL is used, make sure that unmatched capturing
groups at the end are also set to null, rather than just those in the
middle.
2019-03-21 09:50:20 +01:00
Nikita Popov
f53e7394eb Respect OFFSET_CAPTURE when padding preg_match_all() results
This issue was mentioned in bug #73948. The PREG_PATTERN_ORDER
padding was performed without respecting the PREF_OFFSET_CAPTURE
flag, which resulted in unmatched subpatterns being either null or
[null, -1] depending on where they occur. Now they will always be
[null, -1], consistent with other usages.
2019-03-19 15:35:15 +01:00
Nikita Popov
2783670daa Merge branch 'PHP-7.3' into PHP-7.4 2019-03-19 13:59:43 +01:00
Nikita Popov
661bce47ae Fixed bug #76127
Per documentation, and consistent with other preg functions, we
should return false if an error occurred.
2019-03-19 13:57:39 +01:00
Nikita Popov
4fe3d108af Don't create a new array for empty/null match every time
If PREG_OFFSET_CAPTURE is used, unmatched subpatterns will be either
[null, -1] or ['', -1] depending on PREG_UNMATCHED_AS_NULL mode.
Instead of creating a new array like this every time, cache it inside
a global (per-request -- could make it immutable though).

Additionally check whether the subpattern is an empty string or
single character string and use an existing interned string in that
case. Empty / single-char subpatterns are common, so let's avoid
allocating strings for them.
2019-03-19 13:06:21 +01:00
Nikita Popov
38b16274d1 Revert unintended change
I wanted to cache subpat names, but we can't do that because the
cache relives request boundaries.
2019-03-19 12:01:37 +01:00
Nikita Popov
525f19bef5 Use zend_string for subpat_names table
When used with preg_match_all or preg_replace_callback(_array),
subpattern names can be used in the matches array many times.
Switch the subpat_names table to use zend_string, so we don't have
to allocate a new string every time. Also don't bother creating the
table if no $matches were passed.

This might be a regression for the case where preg_match() is used
with many trailing named subpatterns that are skipped in the result
array, but that seems rather contrived.
2019-03-19 11:59:25 +01:00
Nikita Popov
f2438a57ff Avoid copying subpat twice if named subpats are used 2019-03-19 11:18:43 +01:00
Nikita Popov
12bcdd68b4 Fix #77094: Add flags support for pcre_replace_callback(_array) 2019-03-19 10:38:21 +01:00
Nikita Popov
2b9acd37f0 Fixed bug #72685
We currently have a large performance problem when implementing lexers
working on UTF-8 strings in PHP. This kind of code tends to perform a
large number of matches at different offsets on a single string. This
is generally fast. However, if /u mode is used, the full string will
be UTF-8 validated on each match. This results in quadratic runtime.

This patch fixes the issue by adding a IS_STR_VALID_UTF8 flag, which
is set when we have determined that the string is valid UTF8 and
further validation is skipped.

A limitation of this approach is that we can't set the flag for interned
strings. I think this is not a problem for this use-case which will
generally work on dynamic data. If we want to use this flag for other
purposes as well (mbstring?) then it might be worthwhile to UTF-8 validate
strings during interning. But right now this doesn't seem useful.
2019-03-18 16:58:48 +01:00
Nikita Popov
275fa53564 Accept zend_string* instead of char* in php_pcre_match_impl() 2019-03-18 12:32:06 +01:00
Peter Kokot
92ac598aab Remove local variables
This patch removes the so called local variables defined per
file basis for certain editors to properly show tab width, and
similar settings. These are mainly used by Vim and Emacs editors
yet with recent changes the once working definitions don't work
anymore in Vim without custom plugins or additional configuration.
Neither are these settings synced across the PHP code base.

A simpler and better approach is EditorConfig and fixing code
using some code style fixing tools in the future instead.

This patch also removes the so called modelines for Vim. Modelines
allow Vim editor specifically to set some editor configuration such as
syntax highlighting, indentation style and tab width to be set in the
first line or the last 5 lines per file basis. Since the php test
files have syntax highlighting already set in most editors properly and
EditorConfig takes care of the indentation settings, this patch removes
these as well for the Vim 6.0 and newer versions.

With the removal of local variables for certain editors such as
Emacs and Vim, the footer is also probably not needed anymore when
creating extensions using ext_skel.php script.

Additionally, Vim modelines for setting php syntax and some editor
settings has been removed from some *.phpt files.  All these are
mostly not relevant for phpt files neither work properly in the
middle of the file.
2019-02-03 21:03:00 +01:00
Zeev Suraski
0cf7de1c70 Remove yearly range from copyright notice 2019-01-30 11:03:12 +02:00
Nikita Popov
e219ec144e Implement typed properties
RFC: https://wiki.php.net/rfc/typed_properties_v2

This is a squash of PR #3734, which is a squash of PR #3313.

Co-authored-by: Bob Weinand <bobwei9@hotmail.com>
Co-authored-by: Joe Watkins <krakjoe@php.net>
Co-authored-by: Dmitry Stogov <dmitry@zend.com>
2019-01-11 15:49:06 +01:00
Nikita Popov
2fab3302ae Use ZEND_PARSE_PARAMETERS_NONE in pcre
Instead of the manual ZEND_PARSE_PARAMETERS_START(0, 0) form.
2019-01-02 11:18:35 +01:00