Commit graph

649 commits

Author SHA1 Message Date
Stephen Reay
66b750d07e Add stubs for PCRE extension
Closes GH-4501.
2019-08-11 12:46:42 +02:00
Sjon Hortensius
05752d3acf Ref #77388: Don't pass BAD_ESCAPE_IS_LITERAL
This option is considered dangerous and unwanted. To allow for more
graceful migration don't error on now ignored X modifier.

Closes GH-4430.
2019-07-18 10:52:27 +02:00
Dmitry Stogov
e3d35b6434 Split destructor 2019-07-04 13:07:47 +03:00
Nikita Popov
ad1b62fca7 Merge branch 'PHP-7.3' into PHP-7.4 2019-06-17 13:31:04 +02:00
Nikita Popov
11b354dd54 Merge branch 'PHP-7.2' into PHP-7.3 2019-06-17 13:30:56 +02:00
Nikita Popov
03db04c3ab Accept null for preg_quote delimiter argument
Related to bug #78163.
2019-06-17 13:30:15 +02:00
Nikita Popov
51fb8dc422 Add specialized pair construction API
Closes GH-3990.
2019-06-11 12:29:55 +02:00
Nikita Popov
a31f46421d Allow exceptions in __toString()
RFC: https://wiki.php.net/rfc/tostring_exceptions

And convert some object to string conversion related recoverable
fatal errors into Error exceptions.

Improve exception safety of internal code performing string
conversions.
2019-06-05 14:25:07 +02:00
Dmitry Stogov
e188e4170f Use ZEND_TRY_ASSIGN_REF_... macros for arguments passed to internal function by reference 2019-04-24 18:28:29 +03:00
Peter Kokot
e06836a1a3 Remove checks for locale.h, setlocale, localeconv
The `<loccale.h>` header file, setlocale, and localeconv are part of the
standard C89 [1] and on current systems can be used unconditionally.

Since PHP 7.4 requires at least C89 or greater, the `HAVE_LOCALE_H`,
`HAVE_SETLOCALE`, and `HAVE_LOCALECONV` symbols defined by Autoconf in
configure.ac [2] can be ommitted and simplifed.

The bundled libmagic (file) has also been patched already in version
5.35 and up in upstream location so when it will be patched also in
php-src the check for locale.h header is still left in the configure.ac
and in windows headers definition file.

[1] https://port70.net/~nsz/c/c89/c89-draft.html#4.4
[2] https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/headers.m4

Omit the bundled libmagic files
2019-04-07 18:32:54 +02:00
Christoph M. Becker
2733420f82 Merge branch 'PHP-7.3' into PHP-7.4
* PHP-7.3:
  Fix #77827: preg_match does not ignore \r in regex flags
2019-03-31 13:35:50 +02:00
Christoph M. Becker
d8b7728b0e Merge branch 'PHP-7.2' into PHP-7.3
* PHP-7.2:
  Fix #77827: preg_match does not ignore \r in regex flags
2019-03-31 13:33:21 +02:00
Christoph M. Becker
88460c017a Fix #77827: preg_match does not ignore \r in regex flags 2019-03-31 13:31:54 +02:00
Nikita Popov
a9b01b60d8 Make PCRE cache per-request on CLI
There will only be one request on the CLI SAPI, so there is no
advantage to having a persistent PCRE cache. Using a non-persistent
cache allows us to use arbitrary strings as cache keys.
2019-03-26 10:10:41 +01:00
Nikita Popov
e7e2056d1a Remove HAVE_PCRE/HAVE_BUNDLED_PCRE checks
PCRE is always available.
2019-03-22 10:29:18 +01:00
Nikita Popov
1cf84f1579 Try to create interned strings in preg_split as well
And convert last_match to last_match_offset, which is more
convenient now.
2019-03-21 10:19:48 +01:00
Nikita Popov
621b1f0312 Cleanup add_offset_pair API
Accept the two offsets directly, rather than doing length calculations
at all callsites. Also extract the logic to create a possibly interned
string.

Switch the split implementation to work on a char* subject internally,
because ZSTR_VAL(subject_str) is a mouthful...
2019-03-21 10:08:29 +01:00
Nikita Popov
6311581ac6 Fix bug #73948
If PREG_UNMATCHED_AS_NULL is used, make sure that unmatched capturing
groups at the end are also set to null, rather than just those in the
middle.
2019-03-21 09:50:20 +01:00
Nikita Popov
f53e7394eb Respect OFFSET_CAPTURE when padding preg_match_all() results
This issue was mentioned in bug #73948. The PREG_PATTERN_ORDER
padding was performed without respecting the PREF_OFFSET_CAPTURE
flag, which resulted in unmatched subpatterns being either null or
[null, -1] depending on where they occur. Now they will always be
[null, -1], consistent with other usages.
2019-03-19 15:35:15 +01:00
Nikita Popov
2783670daa Merge branch 'PHP-7.3' into PHP-7.4 2019-03-19 13:59:43 +01:00
Nikita Popov
661bce47ae Fixed bug #76127
Per documentation, and consistent with other preg functions, we
should return false if an error occurred.
2019-03-19 13:57:39 +01:00
Nikita Popov
4fe3d108af Don't create a new array for empty/null match every time
If PREG_OFFSET_CAPTURE is used, unmatched subpatterns will be either
[null, -1] or ['', -1] depending on PREG_UNMATCHED_AS_NULL mode.
Instead of creating a new array like this every time, cache it inside
a global (per-request -- could make it immutable though).

Additionally check whether the subpattern is an empty string or
single character string and use an existing interned string in that
case. Empty / single-char subpatterns are common, so let's avoid
allocating strings for them.
2019-03-19 13:06:21 +01:00
Nikita Popov
38b16274d1 Revert unintended change
I wanted to cache subpat names, but we can't do that because the
cache relives request boundaries.
2019-03-19 12:01:37 +01:00
Nikita Popov
525f19bef5 Use zend_string for subpat_names table
When used with preg_match_all or preg_replace_callback(_array),
subpattern names can be used in the matches array many times.
Switch the subpat_names table to use zend_string, so we don't have
to allocate a new string every time. Also don't bother creating the
table if no $matches were passed.

This might be a regression for the case where preg_match() is used
with many trailing named subpatterns that are skipped in the result
array, but that seems rather contrived.
2019-03-19 11:59:25 +01:00
Nikita Popov
f2438a57ff Avoid copying subpat twice if named subpats are used 2019-03-19 11:18:43 +01:00
Nikita Popov
12bcdd68b4 Fix #77094: Add flags support for pcre_replace_callback(_array) 2019-03-19 10:38:21 +01:00
Nikita Popov
2b9acd37f0 Fixed bug #72685
We currently have a large performance problem when implementing lexers
working on UTF-8 strings in PHP. This kind of code tends to perform a
large number of matches at different offsets on a single string. This
is generally fast. However, if /u mode is used, the full string will
be UTF-8 validated on each match. This results in quadratic runtime.

This patch fixes the issue by adding a IS_STR_VALID_UTF8 flag, which
is set when we have determined that the string is valid UTF8 and
further validation is skipped.

A limitation of this approach is that we can't set the flag for interned
strings. I think this is not a problem for this use-case which will
generally work on dynamic data. If we want to use this flag for other
purposes as well (mbstring?) then it might be worthwhile to UTF-8 validate
strings during interning. But right now this doesn't seem useful.
2019-03-18 16:58:48 +01:00
Nikita Popov
275fa53564 Accept zend_string* instead of char* in php_pcre_match_impl() 2019-03-18 12:32:06 +01:00
Peter Kokot
92ac598aab Remove local variables
This patch removes the so called local variables defined per
file basis for certain editors to properly show tab width, and
similar settings. These are mainly used by Vim and Emacs editors
yet with recent changes the once working definitions don't work
anymore in Vim without custom plugins or additional configuration.
Neither are these settings synced across the PHP code base.

A simpler and better approach is EditorConfig and fixing code
using some code style fixing tools in the future instead.

This patch also removes the so called modelines for Vim. Modelines
allow Vim editor specifically to set some editor configuration such as
syntax highlighting, indentation style and tab width to be set in the
first line or the last 5 lines per file basis. Since the php test
files have syntax highlighting already set in most editors properly and
EditorConfig takes care of the indentation settings, this patch removes
these as well for the Vim 6.0 and newer versions.

With the removal of local variables for certain editors such as
Emacs and Vim, the footer is also probably not needed anymore when
creating extensions using ext_skel.php script.

Additionally, Vim modelines for setting php syntax and some editor
settings has been removed from some *.phpt files.  All these are
mostly not relevant for phpt files neither work properly in the
middle of the file.
2019-02-03 21:03:00 +01:00
Zeev Suraski
0cf7de1c70 Remove yearly range from copyright notice 2019-01-30 11:03:12 +02:00
Nikita Popov
e219ec144e Implement typed properties
RFC: https://wiki.php.net/rfc/typed_properties_v2

This is a squash of PR #3734, which is a squash of PR #3313.

Co-authored-by: Bob Weinand <bobwei9@hotmail.com>
Co-authored-by: Joe Watkins <krakjoe@php.net>
Co-authored-by: Dmitry Stogov <dmitry@zend.com>
2019-01-11 15:49:06 +01:00
Nikita Popov
2fab3302ae Use ZEND_PARSE_PARAMETERS_NONE in pcre
Instead of the manual ZEND_PARSE_PARAMETERS_START(0, 0) form.
2019-01-02 11:18:35 +01:00
Nikita Popov
27e9c05e81 Remove preg_options param from pcre_get_compiled_regex()
This parameter is always zero and not necessary to call pcre2_match.

I'm leaving the parameter behind on the _ex() variant, so the preg_flags
are still accessible in some way.
2018-12-26 17:20:13 +01:00
Nikita Popov
b1deb98c42 Fixed bug #77338
Set preg_options to 0 in php_pcre_get_compiled_regex(_ex). These
options are intended to be passed to pcre2_match. However, we do
not have any flags that actually need to be set during matching
(all relevant flags are set during compilation), and the preg_flags
value is used for PHP-specific flags instead.

This parameter should be removed entirely in master to avoid confusion.
2018-12-26 17:11:27 +01:00
Anatol Belski
ef1269d5c1 Fixed bug #77193 Infinite loop in preg_replace_callback
Don't return preallocated match data more than once in nested calls.
2018-12-01 10:24:06 +01:00
Anatol Belski
68c34ce0dc Make a copy unconditionally 2018-09-09 10:42:53 +02:00
Anatol Belski
9278be148e Fix memory leak in pcre cache 2018-09-09 10:38:36 +02:00
Anatol Belski
c6ddd45650 Fixed bug #76850 Exit code mangled by set locale/preg_match 2018-09-08 21:35:23 +02:00
Peter Kokot
8d3f8ca12a Remove unused Git attributes ident
The $Id$ keywords were used in Subversion where they can be substituted
with filename, last revision number change, last changed date, and last
user who changed it.

In Git this functionality is different and can be done with Git attribute
ident. These need to be defined manually for each file in the
.gitattributes file and are afterwards replaced with 40-character
hexadecimal blob object name which is based only on the particular file
contents.

This patch simplifies handling of $Id$ keywords by removing them since
they are not used anymore.
2018-07-25 00:53:25 +02:00
Dmitry Stogov
5be44312f8 Removed redundand code 2018-07-19 15:47:15 +03:00
Dmitry Stogov
54ebebd686 Matching loops optimization 2018-07-19 15:28:31 +03:00
Dmitry Stogov
b81d712961 Micro optimizations 2018-07-19 11:19:28 +03:00
Dmitry Stogov
1820f2f2f3 Reorder conditions 2018-07-18 17:46:48 +03:00
Dmitry Stogov
29f942b3d0 Move "/e" modifier check into regex compiler 2018-07-18 16:35:17 +03:00
Dmitry Stogov
5d60651165 Merge "no_utf_check" and "g_notempty" into single "options". 2018-07-18 16:10:41 +03:00
Anatol Belski
81eb8e7507 Mark conditions unexpected 2018-07-11 18:05:28 +02:00
Anatol Belski
0630e3bc03 Reduce error buffer size
120 bytes is ample, the doc says.
2018-07-05 17:24:38 +02:00
Anatol Belski
ff8f2710f6 Check return value of pcre2_maketables() 2018-06-29 19:15:38 +02:00
Anatol Belski
aa92d42018 If there's no setlocale, char tables are not used 2018-06-22 17:31:26 +02:00
Anatol Belski
8b58b2aac6 Don't discard char tables just generated 2018-06-22 15:18:39 +02:00