Merge branch 'PHP-5.4' into PHP-5.5

* PHP-5.4:
  Upgrade PCRE to 8.36, it fixes some crashes
This commit is contained in:
Stanislav Malyshev 2015-04-27 23:22:44 -07:00
commit 13c32a102c
65 changed files with 5142 additions and 4382 deletions

View file

@ -8,7 +8,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service, University of Cambridge Computing Service,
Cambridge, England. Cambridge, England.
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
All rights reserved All rights reserved
@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2010-2013 Zoltan Herczeg Copyright(c) 2010-2014 Zoltan Herczeg
All rights reserved. All rights reserved.
@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2009-2013 Zoltan Herczeg Copyright(c) 2009-2014 Zoltan Herczeg
All rights reserved. All rights reserved.

View file

@ -1,6 +1,224 @@
ChangeLog for PCRE ChangeLog for PCRE
------------------ ------------------
Version 8.36 26-September-2014
------------------------------
1. Got rid of some compiler warnings in the C++ modules that were shown up by
-Wmissing-field-initializers and -Wunused-parameter.
2. The tests for quantifiers being too big (greater than 65535) were being
applied after reading the number, and stupidly assuming that integer
overflow would give a negative number. The tests are now applied as the
numbers are read.
3. Tidy code in pcre_exec.c where two branches that used to be different are
now the same.
4. The JIT compiler did not generate match limit checks for certain
bracketed expressions with quantifiers. This may lead to exponential
backtracking, instead of returning with PCRE_ERROR_MATCHLIMIT. This
issue should be resolved now.
5. Fixed an issue, which occures when nested alternatives are optimized
with table jumps.
6. Inserted two casts and changed some ints to size_t in the light of some
reported 64-bit compiler warnings (Bugzilla 1477).
7. Fixed a bug concerned with zero-minimum possessive groups that could match
an empty string, which sometimes were behaving incorrectly in the
interpreter (though correctly in the JIT matcher). This pcretest input is
an example:
'\A(?:[^"]++|"(?:[^"]*+|"")*+")++'
NON QUOTED "QUOT""ED" AFTER "NOT MATCHED
the interpreter was reporting a match of 'NON QUOTED ' only, whereas the
JIT matcher and Perl both matched 'NON QUOTED "QUOT""ED" AFTER '. The test
for an empty string was breaking the inner loop and carrying on at a lower
level, when possessive repeated groups should always return to a higher
level as they have no backtrack points in them. The empty string test now
occurs at the outer level.
8. Fixed a bug that was incorrectly auto-possessifying \w+ in the pattern
^\w+(?>\s*)(?<=\w) which caused it not to match "test test".
9. Give a compile-time error for \o{} (as Perl does) and for \x{} (which Perl
doesn't).
10. Change 8.34/15 introduced a bug that caused the amount of memory needed
to hold a pattern to be incorrectly computed (too small) when there were
named back references to duplicated names. This could cause "internal
error: code overflow" or "double free or corruption" or other memory
handling errors.
11. When named subpatterns had the same prefixes, back references could be
confused. For example, in this pattern:
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
the reference to 'Name' was incorrectly treated as a reference to a
duplicate name.
12. A pattern such as /^s?c/mi8 where the optional character has more than
one "other case" was incorrectly compiled such that it would only try to
match starting at "c".
13. When a pattern starting with \s was studied, VT was not included in the
list of possible starting characters; this should have been part of the
8.34/18 patch.
14. If a character class started [\Qx]... where x is any character, the class
was incorrectly terminated at the ].
15. If a pattern that started with a caseless match for a character with more
than one "other case" was studied, PCRE did not set up the starting code
unit bit map for the list of possible characters. Now it does. This is an
optimization improvement, not a bug fix.
16. The Unicode data tables have been updated to Unicode 7.0.0.
17. Fixed a number of memory leaks in pcregrep.
18. Avoid a compiler warning (from some compilers) for a function call with
a cast that removes "const" from an lvalue by using an intermediate
variable (to which the compiler does not object).
19. Incorrect code was compiled if a group that contained an internal recursive
back reference was optional (had quantifier with a minimum of zero). This
example compiled incorrect code: /(((a\2)|(a*)\g<-1>))*/ and other examples
caused segmentation faults because of stack overflows at compile time.
20. A pattern such as /((?(R)a|(?1)))+/, which contains a recursion within a
group that is quantified with an indefinite repeat, caused a compile-time
loop which used up all the system stack and provoked a segmentation fault.
This was not the same bug as 19 above.
21. Add PCRECPP_EXP_DECL declaration to operator<< in pcre_stringpiece.h.
Patch by Mike Frysinger.
Version 8.35 04-April-2014
--------------------------
1. A new flag is set, when property checks are present in an XCLASS.
When this flag is not set, PCRE can perform certain optimizations
such as studying these XCLASS-es.
2. The auto-possessification of character sets were improved: a normal
and an extended character set can be compared now. Furthermore
the JIT compiler optimizes more character set checks.
3. Got rid of some compiler warnings for potentially uninitialized variables
that show up only when compiled with -O2.
4. A pattern such as (?=ab\K) that uses \K in an assertion can set the start
of a match later then the end of the match. The pcretest program was not
handling the case sensibly - it was outputting from the start to the next
binary zero. It now reports this situation in a message, and outputs the
text from the end to the start.
5. Fast forward search is improved in JIT. Instead of the first three
characters, any three characters with fixed position can be searched.
Search order: first, last, middle.
6. Improve character range checks in JIT. Characters are read by an inprecise
function now, which returns with an unknown value if the character code is
above a certain threshold (e.g: 256). The only limitation is that the value
must be bigger than the threshold as well. This function is useful when
the characters above the threshold are handled in the same way.
7. The macros whose names start with RAWUCHAR are placeholders for a future
mode in which only the bottom 21 bits of 32-bit data items are used. To
make this more memorable for those maintaining the code, the names have
been changed to start with UCHAR21, and an extensive comment has been added
to their definition.
8. Add missing (new) files sljitNativeTILEGX.c and sljitNativeTILEGX-encoder.c
to the export list in Makefile.am (they were accidentally omitted from the
8.34 tarball).
9. The informational output from pcretest used the phrase "starting byte set"
which is inappropriate for the 16-bit and 32-bit libraries. As the output
for "first char" and "need char" really means "non-UTF-char", I've changed
"byte" to "char", and slightly reworded the output. The documentation about
these values has also been (I hope) clarified.
10. Another JIT related optimization: use table jumps for selecting the correct
backtracking path, when more than four alternatives are present inside a
bracket.
11. Empty match is not possible, when the minimum length is greater than zero,
and there is no \K in the pattern. JIT should avoid empty match checks in
such cases.
12. In a caseless character class with UCP support, when a character with more
than one alternative case was not the first character of a range, not all
the alternative cases were added to the class. For example, s and \x{17f}
are both alternative cases for S: the class [RST] was handled correctly,
but [R-T] was not.
13. The configure.ac file always checked for pthread support when JIT was
enabled. This is not used in Windows, so I have put this test inside a
check for the presence of windows.h (which was already tested for).
14. Improve pattern prefix search by a simplified Boyer-Moore algorithm in JIT.
The algorithm provides a way to skip certain starting offsets, and usually
faster than linear prefix searches.
15. Change 13 for 8.20 updated RunTest to check for the 'fr' locale as well
as for 'fr_FR' and 'french'. For some reason, however, it then used the
Windows-specific input and output files, which have 'french' screwed in.
So this could never have worked. One of the problems with locales is that
they aren't always the same. I have now updated RunTest so that it checks
the output of the locale test (test 3) against three different output
files, and it allows the test to pass if any one of them matches. With luck
this should make the test pass on some versions of Solaris where it was
failing. Because of the uncertainty, the script did not used to stop if
test 3 failed; it now does. If further versions of a French locale ever
come to light, they can now easily be added.
16. If --with-pcregrep-bufsize was given a non-integer value such as "50K",
there was a message during ./configure, but it did not stop. This now
provokes an error. The invalid example in README has been corrected.
If a value less than the minimum is given, the minimum value has always
been used, but now a warning is given.
17. If --enable-bsr-anycrlf was set, the special 16/32-bit test failed. This
was a bug in the test system, which is now fixed. Also, the list of various
configurations that are tested for each release did not have one with both
16/32 bits and --enable-bar-anycrlf. It now does.
18. pcretest was missing "-C bsr" for displaying the \R default setting.
19. Little endian PowerPC systems are supported now by the JIT compiler.
20. The fast forward newline mechanism could enter to an infinite loop on
certain invalid UTF-8 input. Although we don't support these cases
this issue can be fixed by a performance optimization.
21. Change 33 of 8.34 is not sufficient to ensure stack safety because it does
not take account if existing stack usage. There is now a new global
variable called pcre_stack_guard that can be set to point to an external
function to check stack availability. It is called at the start of
processing every parenthesized group.
22. A typo in the code meant that in ungreedy mode the max/min qualifier
behaved like a min-possessive qualifier, and, for example, /a{1,3}b/U did
not match "ab".
23. When UTF was disabled, the JIT program reported some incorrect compile
errors. These messages are silenced now.
24. Experimental support for ARM-64 and MIPS-64 has been added to the JIT
compiler.
25. Change all the temporary files used in RunGrepTest to be different to those
used by RunTest so that the tests can be run simultaneously, for example by
"make -j check".
Version 8.34 15-December-2013 Version 8.34 15-December-2013
----------------------------- -----------------------------
@ -5311,7 +5529,7 @@ by an auxiliary program - but can then be edited by hand if required. There are
now no calls to isalnum(), isspace(), isdigit(), isxdigit(), tolower() or now no calls to isalnum(), isspace(), isdigit(), isxdigit(), tolower() or
toupper() in the code. toupper() in the code.
7. Turn the malloc/free functions variables into pcre_malloc and pcre_free and 7. Turn the malloc/free funtions variables into pcre_malloc and pcre_free and
make them global. Abolish the function for setting them, as the caller can now make them global. Abolish the function for setting them, as the caller can now
set them directly. set them directly.

View file

@ -24,7 +24,7 @@ Email domain: cam.ac.uk
University of Cambridge Computing Service, University of Cambridge Computing Service,
Cambridge, England. Cambridge, England.
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
All rights reserved. All rights reserved.
@ -35,7 +35,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2010-2013 Zoltan Herczeg Copyright(c) 2010-2014 Zoltan Herczeg
All rights reserved. All rights reserved.
@ -46,7 +46,7 @@ Written by: Zoltan Herczeg
Email local part: hzmester Email local part: hzmester
Emain domain: freemail.hu Emain domain: freemail.hu
Copyright(c) 2009-2013 Zoltan Herczeg Copyright(c) 2009-2014 Zoltan Herczeg
All rights reserved. All rights reserved.

View file

@ -1,6 +1,24 @@
News about PCRE releases News about PCRE releases
------------------------ ------------------------
Release 8.36 26-September-2014
------------------------------
This is primarily a bug-fix release. However, in addition, the Unicode data
tables have been updated to Unicode 7.0.0.
Release 8.35 04-April-2014
--------------------------
There have been performance improvements for classes containing non-ASCII
characters and the "auto-possessification" feature has been extended. Other
minor improvements have been implemented and bugs fixed. There is a new callout
feature to enable applications to do detailed stack checks at compile time, to
avoid running out of stack for deeply nested parentheses. The JIT compiler has
been extended with experimental support for ARM-64, MIPS-64, and PPC-LE.
Release 8.34 15-December-2013 Release 8.34 15-December-2013
----------------------------- -----------------------------

View file

@ -45,14 +45,16 @@ the 16-bit library, which processes strings of 16-bit values, and one for the
32-bit library, which processes strings of 32-bit values. The distribution also 32-bit library, which processes strings of 32-bit values. The distribution also
includes a set of C++ wrapper functions (see the pcrecpp man page for details), includes a set of C++ wrapper functions (see the pcrecpp man page for details),
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
C++. C++. Other C++ wrappers have been created from time to time. See, for example:
https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
style to the C API.
In addition, there is a set of C wrapper functions (again, just for the 8-bit The distribution also contains a set of C wrapper functions (again, just for
library) that are based on the POSIX regular expression API (see the pcreposix the 8-bit library) that are based on the POSIX regular expression API (see the
man page). These end up in the library called libpcreposix. Note that this just pcreposix man page). These end up in the library called libpcreposix. Note that
provides a POSIX calling interface to PCRE; the regular expressions themselves this just provides a POSIX calling interface to PCRE; the regular expressions
still follow Perl syntax and semantics. The POSIX API is restricted, and does themselves still follow Perl syntax and semantics. The POSIX API is restricted,
not give full access to all of PCRE's facilities. and does not give full access to all of PCRE's facilities.
The header file for the POSIX-style functions is called pcreposix.h. The The header file for the POSIX-style functions is called pcreposix.h. The
official POSIX name is regex.h, but I did not want to risk possible problems official POSIX name is regex.h, but I did not want to risk possible problems
@ -85,11 +87,12 @@ documentation is supplied in two other forms:
1. There are files called doc/pcre.txt, doc/pcregrep.txt, and 1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
doc/pcretest.txt in the source distribution. The first of these is a doc/pcretest.txt in the source distribution. The first of these is a
concatenation of the text forms of all the section 3 man pages except concatenation of the text forms of all the section 3 man pages except
those that summarize individual functions. The other two are the text the listing of pcredemo.c and those that summarize individual functions.
forms of the section 1 man pages for the pcregrep and pcretest commands. The other two are the text forms of the section 1 man pages for the
These text forms are provided for ease of scanning with text editors or pcregrep and pcretest commands. These text forms are provided for ease of
similar tools. They are installed in <prefix>/share/doc/pcre, where scanning with text editors or similar tools. They are installed in
<prefix> is the installation prefix (defaulting to /usr/local). <prefix>/share/doc/pcre, where <prefix> is the installation prefix
(defaulting to /usr/local).
2. A set of files containing all the documentation in HTML form, hyperlinked 2. A set of files containing all the documentation in HTML form, hyperlinked
in various ways, and rooted in a file called index.html, is distributed in in various ways, and rooted in a file called index.html, is distributed in
@ -372,12 +375,12 @@ library. They are also documented in the pcrebuild man page.
Of course, the relevant libraries must be installed on your system. Of course, the relevant libraries must be installed on your system.
. The default size of internal buffer used by pcregrep can be set by, for . The default size (in bytes) of the internal buffer used by pcregrep can be
example: set by, for example:
--with-pcregrep-bufsize=50K --with-pcregrep-bufsize=51200
The default value is 20K. The value must be a plain integer. The default is 20480.
. It is possible to compile pcretest so that it links with the libreadline . It is possible to compile pcretest so that it links with the libreadline
or libedit libraries, by specifying, respectively, or libedit libraries, by specifying, respectively,
@ -987,4 +990,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 05 November 2013 Last updated: 24 October 2014

View file

@ -314,7 +314,7 @@ them both to 0; an emulation function will be used. */
#define PACKAGE_NAME "PCRE" #define PACKAGE_NAME "PCRE"
/* Define to the full name and version of this package. */ /* Define to the full name and version of this package. */
#define PACKAGE_STRING "PCRE 8.32" #define PACKAGE_STRING "PCRE 8.36"
/* Define to the one symbol short name of this package. */ /* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "pcre" #define PACKAGE_TARNAME "pcre"
@ -323,7 +323,7 @@ them both to 0; an emulation function will be used. */
#define PACKAGE_URL "" #define PACKAGE_URL ""
/* Define to the version of this package. */ /* Define to the version of this package. */
#define PACKAGE_VERSION "8.32" #define PACKAGE_VERSION "8.36"
/* to make a symbol visible */ /* to make a symbol visible */
/* #undef PCRECPP_EXP_DECL */ /* #undef PCRECPP_EXP_DECL */
@ -331,6 +331,13 @@ them both to 0; an emulation function will be used. */
/* to make a symbol visible */ /* to make a symbol visible */
/* #undef PCRECPP_EXP_DEFN */ /* #undef PCRECPP_EXP_DEFN */
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
parentheses (of any kind) in a pattern. This limits the amount of system
stack that is used while compiling a pattern. */
#ifndef PARENS_NEST_LIMIT
#define PARENS_NEST_LIMIT 250
#endif
/* The value of PCREGREP_BUFSIZE determines the size of buffer used by /* The value of PCREGREP_BUFSIZE determines the size of buffer used by
pcregrep to hold parts of the file it is searching. This is also the pcregrep to hold parts of the file it is searching. This is also the
minimum value. The actual amount of memory used by pcregrep is three times minimum value. The actual amount of memory used by pcregrep is three times
@ -432,7 +439,7 @@ them both to 0; an emulation function will be used. */
/* Version number of package */ /* Version number of package */
#ifndef VERSION #ifndef VERSION
#define VERSION "8.34" #define VERSION "8.36"
#endif #endif
/* Define to empty if `const' does not conform to ANSI C. */ /* Define to empty if `const' does not conform to ANSI C. */
@ -444,3 +451,4 @@ them both to 0; an emulation function will be used. */
/* Define to `unsigned int' if <sys/types.h> does not define. */ /* Define to `unsigned int' if <sys/types.h> does not define. */
/* #undef size_t */ /* #undef size_t */

View file

@ -130,9 +130,11 @@ USER DOCUMENTATION
The user documentation for PCRE comprises a number of different sec- The user documentation for PCRE comprises a number of different sec-
tions. In the "man" format, each of these is a separate "man page". In tions. In the "man" format, each of these is a separate "man page". In
the HTML format, each is a separate page, linked from the index page. the HTML format, each is a separate page, linked from the index page.
In the plain text format, all the sections, except the pcredemo sec- In the plain text format, the descriptions of the pcregrep and pcretest
tion, are concatenated, for ease of searching. The sections are as fol- programs are in files called pcregrep.txt and pcretest.txt, respec-
lows: tively. The remaining sections, except for the pcredemo section (which
is a program listing), are concatenated in pcre.txt, for ease of
searching. The sections are as follows:
pcre this document pcre this document
pcre-config show PCRE installation configuration information pcre-config show PCRE installation configuration information
@ -160,8 +162,8 @@ USER DOCUMENTATION
pcretest description of the pcretest testing command pcretest description of the pcretest testing command
pcreunicode discussion of Unicode and UTF-8/16/32 support pcreunicode discussion of Unicode and UTF-8/16/32 support
In addition, in the "man" and HTML formats, there is a short page for In the "man" and HTML formats, there is also a short page for each C
each C library function, listing its arguments and results. library function, listing its arguments and results.
AUTHOR AUTHOR
@ -177,8 +179,8 @@ AUTHOR
REVISION REVISION
Last updated: 13 May 2013 Last updated: 08 January 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -1674,6 +1676,8 @@ PCRE NATIVE API INDIRECTED FUNCTIONS
int (*pcre_callout)(pcre_callout_block *); int (*pcre_callout)(pcre_callout_block *);
int (*pcre_stack_guard)(void);
PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
@ -1809,6 +1813,14 @@ PCRE API OVERVIEW
specified points during a matching operation. Details are given in the specified points during a matching operation. Details are given in the
pcrecallout documentation. pcrecallout documentation.
The global variable pcre_stack_guard initially contains NULL. It can be
set by the caller to a function that is called by PCRE whenever it
starts to compile a parenthesized part of a pattern. When parentheses
are nested, PCRE uses recursive function calls, which use up the system
stack. This function is provided so that applications with restricted
stacks can force a compilation error if the stack runs out. The func-
tion should return zero if all is well, or non-zero to force an error.
NEWLINES NEWLINES
@ -1849,7 +1861,8 @@ MULTITHREADING
The PCRE functions can be used in multi-threading applications, with The PCRE functions can be used in multi-threading applications, with
the proviso that the memory management functions pointed to by the proviso that the memory management functions pointed to by
pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the
callout function pointed to by pcre_callout, are shared by all threads. callout and stack-checking functions pointed to by pcre_callout and
pcre_stack_guard, are shared by all threads.
The compiled form of a regular expression is not altered during match- The compiled form of a regular expression is not altered during match-
ing, so the same compiled pattern can safely be used by several threads ing, so the same compiled pattern can safely be used by several threads
@ -1971,7 +1984,10 @@ CHECKING BUILD-TIME OPTIONS
The output is a long integer that gives the maximum depth of nesting of The output is a long integer that gives the maximum depth of nesting of
parentheses (of any kind) in a pattern. This limit is imposed to cap parentheses (of any kind) in a pattern. This limit is imposed to cap
the amount of system stack used when a pattern is compiled. It is spec- the amount of system stack used when a pattern is compiled. It is spec-
ified when PCRE is built; the default is 250. ified when PCRE is built; the default is 250. This limit does not take
into account the stack that may already be used by the calling applica-
tion. For finer control over compilation stack usage, you can set a
pointer to an external checking function in pcre_stack_guard.
PCRE_CONFIG_MATCH_LIMIT PCRE_CONFIG_MATCH_LIMIT
@ -2474,6 +2490,8 @@ COMPILATION ERROR CODES
81 missing opening brace after \o 81 missing opening brace after \o
82 parentheses are too deeply nested 82 parentheses are too deeply nested
83 invalid range in character class 83 invalid range in character class
84 group name must start with a non-digit
85 parentheses are too deeply nested (stack check)
The numbers 32 and 10000 in errors 48 and 49 are defaults; different The numbers 32 and 10000 in errors 48 and 49 are defaults; different
values may be used if the limits were changed when PCRE was built. values may be used if the limits were changed when PCRE was built.
@ -2714,12 +2732,16 @@ INFORMATION ABOUT A PATTERN
tion. External callers can cause PCRE to use its internal tables by tion. External callers can cause PCRE to use its internal tables by
passing a NULL table pointer. passing a NULL table pointer.
PCRE_INFO_FIRSTBYTE PCRE_INFO_FIRSTBYTE (deprecated)
Return information about the first data unit of any matched string, for Return information about the first data unit of any matched string, for
a non-anchored pattern. (The name of this option refers to the 8-bit a non-anchored pattern. The name of this option refers to the 8-bit
library, where data units are bytes.) The fourth argument should point library, where data units are bytes. The fourth argument should point
to an int variable. to an int variable. Negative values are used for special cases. How-
ever, this means that when the 32-bit library is in non-UTF-32 mode,
the full 32-bit range of characters cannot be returned. For this rea-
son, this value is deprecated; use PCRE_INFO_FIRSTCHARACTERFLAGS and
PCRE_INFO_FIRSTCHARACTER instead.
If there is a fixed first value, for example, the letter "c" from a If there is a fixed first value, for example, the letter "c" from a
pattern such as (cat|cow|coyote), its value is returned. In the 8-bit pattern such as (cat|cow|coyote), its value is returned. In the 8-bit
@ -2739,10 +2761,38 @@ INFORMATION ABOUT A PATTERN
of a subject string or after any newline within the string. Otherwise of a subject string or after any newline within the string. Otherwise
-2 is returned. For anchored patterns, -2 is returned. -2 is returned. For anchored patterns, -2 is returned.
Since for the 32-bit library using the non-UTF-32 mode, this function PCRE_INFO_FIRSTCHARACTER
is unable to return the full 32-bit range of the character, this value
is deprecated; instead the PCRE_INFO_FIRSTCHARACTERFLAGS and Return the value of the first data unit (non-UTF character) of any
PCRE_INFO_FIRSTCHARACTER values should be used. matched string in the situation where PCRE_INFO_FIRSTCHARACTERFLAGS
returns 1; otherwise return 0. The fourth argument should point to an
uint_t variable.
In the 8-bit library, the value is always less than 256. In the 16-bit
library the value can be up to 0xffff. In the 32-bit library in UTF-32
mode the value can be up to 0x10ffff, and up to 0xffffffff when not
using UTF-32 mode.
PCRE_INFO_FIRSTCHARACTERFLAGS
Return information about the first data unit of any matched string, for
a non-anchored pattern. The fourth argument should point to an int
variable.
If there is a fixed first value, for example, the letter "c" from a
pattern such as (cat|cow|coyote), 1 is returned, and the character
value can be retrieved using PCRE_INFO_FIRSTCHARACTER. If there is no
fixed first value, and if either
(a) the pattern was compiled with the PCRE_MULTILINE option, and every
branch starts with "^", or
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
set (if it were set, the pattern would be anchored),
2 is returned, indicating that the pattern matches only at the start of
a subject string or after any newline within the string. Otherwise 0 is
returned. For anchored patterns, 0 is returned.
PCRE_INFO_FIRSTTABLE PCRE_INFO_FIRSTTABLE
@ -2954,39 +3004,6 @@ INFORMATION ABOUT A PATTERN
option so that it can be saved and restored (see the pcreprecompile option so that it can be saved and restored (see the pcreprecompile
documentation for details). documentation for details).
PCRE_INFO_FIRSTCHARACTERFLAGS
Return information about the first data unit of any matched string, for
a non-anchored pattern. The fourth argument should point to an int
variable.
If there is a fixed first value, for example, the letter "c" from a
pattern such as (cat|cow|coyote), 1 is returned, and the character
value can be retrieved using PCRE_INFO_FIRSTCHARACTER.
If there is no fixed first value, and if either
(a) the pattern was compiled with the PCRE_MULTILINE option, and every
branch starts with "^", or
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
set (if it were set, the pattern would be anchored),
2 is returned, indicating that the pattern matches only at the start of
a subject string or after any newline within the string. Otherwise 0 is
returned. For anchored patterns, 0 is returned.
PCRE_INFO_FIRSTCHARACTER
Return the fixed first character value in the situation where
PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; otherwise return 0. The fourth
argument should point to an uint_t variable.
In the 8-bit library, the value is always less than 256. In the 16-bit
library the value can be up to 0xffff. In the 32-bit library in UTF-32
mode the value can be up to 0x10ffff, and up to 0xffffffff when not
using UTF-32 mode.
PCRE_INFO_REQUIREDCHARFLAGS PCRE_INFO_REQUIREDCHARFLAGS
Returns 1 if there is a rightmost literal data unit that must exist in Returns 1 if there is a rightmost literal data unit that must exist in
@ -4248,8 +4265,8 @@ AUTHOR
REVISION REVISION
Last updated: 12 November 2013 Last updated: 09 February 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -5309,21 +5326,25 @@ BACKSLASH
Those that are not part of an identified script are lumped together as Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is: "Common". The current list of scripts is:
Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo, Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali,
Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma, Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car-
Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei-
Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic, form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero-
Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira- glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha,
gana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip- Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana,
Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li, tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian, Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin-
Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive, ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic,
Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko, Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic, Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean,
Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari- New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian,
tan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese, Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya,
Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac,
Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi,
Yi. Yi.
Each character has exactly one Unicode general category property, spec- Each character has exactly one Unicode general category property, spec-
@ -5510,7 +5531,9 @@ BACKSLASH
Perl documents that the use of \K within assertions is "not well Perl documents that the use of \K within assertions is "not well
defined". In PCRE, \K is acted upon when it occurs inside positive defined". In PCRE, \K is acted upon when it occurs inside positive
assertions, but is ignored in negative assertions. assertions, but is ignored in negative assertions. Note that when a
pattern such as (?=ab\K) matches, the reported start of the match can
be greater than the end of the match.
Simple assertions Simple assertions
@ -7399,19 +7422,23 @@ BACKTRACKING CONTROL
Note that (*COMMIT) at the start of a pattern is not the same as an Note that (*COMMIT) at the start of a pattern is not the same as an
anchor, unless PCRE's start-of-match optimizations are turned off, as anchor, unless PCRE's start-of-match optimizations are turned off, as
shown in this pcretest example: shown in this output from pcretest:
re> /(*COMMIT)abc/ re> /(*COMMIT)abc/
data> xyzabc data> xyzabc
0: abc 0: abc
xyzabc\Y data> xyzabc\Y
No match No match
PCRE knows that any match must start with "a", so the optimization For this pattern, PCRE knows that any match must start with "a", so the
skips along the subject to "a" before running the first match attempt, optimization skips along the subject to "a" before applying the pattern
which succeeds. When the optimization is disabled by the \Y escape in to the first set of data. The match attempt then succeeds. In the sec-
the second subject, the match starts at "x" and so the (*COMMIT) causes ond set of data, the escape sequence \Y is interpreted by the pcretest
it to fail without trying any other starting points. program. It causes the PCRE_NO_START_OPTIMIZE option to be set when
pcre_exec() is called. This disables the optimization that skips along
to the first character. The pattern is now applied starting at "x", and
so the (*COMMIT) causes the match to fail without trying any other
starting points.
(*PRUNE) or (*PRUNE:NAME) (*PRUNE) or (*PRUNE:NAME)
@ -7618,8 +7645,8 @@ AUTHOR
REVISION REVISION
Last updated: 03 December 2013 Last updated: 08 January 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -7754,21 +7781,25 @@ PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P
SCRIPT NAMES FOR \p AND \P SCRIPT NAMES FOR \p AND \P
Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo, Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali,
Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma, Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car-
Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei-
Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic, form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero-
Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira- glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha,
gana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip- Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana,
Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li, tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian, Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin-
Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive, ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic,
Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko, Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic, Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean,
Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari- New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian,
tan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese, Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya,
Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac,
Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi,
Yi. Yi.
@ -7840,6 +7871,8 @@ MATCH POINT RESET
\K reset start of match \K reset start of match
\K is honoured in positive assertions, but ignored in negative ones.
ALTERNATION ALTERNATION
@ -7877,11 +7910,13 @@ OPTION SETTING
(?x) extended (ignore white space) (?x) extended (ignore white space)
(?-...) unset option(s) (?-...) unset option(s)
The following are recognized only at the start of a pattern or after The following are recognized only at the very start of a pattern or
one of the newline-setting options with similar syntax: after one of the newline or \R options with similar syntax. More than
one of them may appear.
(*LIMIT_MATCH=d) set the match limit to d (decimal number) (*LIMIT_MATCH=d) set the match limit to d (decimal number)
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
(*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
(*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
(*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
@ -7893,6 +7928,27 @@ OPTION SETTING
the limits set by the caller of pcre_exec(), not increase them. the limits set by the caller of pcre_exec(), not increase them.
NEWLINE CONVENTION
These are recognized only at the very start of the pattern or after
option settings with a similar syntax.
(*CR) carriage return only
(*LF) linefeed only
(*CRLF) carriage return followed by linefeed
(*ANYCRLF) all three of the above
(*ANY) any Unicode newline sequence
WHAT \R MATCHES
These are recognized only at the very start of the pattern or after
option setting with a similar syntax.
(*BSR_ANYCRLF) CR, LF, or CRLF
(*BSR_UNICODE) any Unicode newline sequence
LOOKAHEAD AND LOOKBEHIND ASSERTIONS LOOKAHEAD AND LOOKBEHIND ASSERTIONS
(?=...) positive look ahead (?=...) positive look ahead
@ -7975,27 +8031,6 @@ BACKTRACKING CONTROL
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
NEWLINE CONVENTIONS
These are recognized only at the very start of the pattern or after a
(*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
(*CR) carriage return only
(*LF) linefeed only
(*CRLF) carriage return followed by linefeed
(*ANYCRLF) all three of the above
(*ANY) any Unicode newline sequence
WHAT \R MATCHES
These are recognized only at the very start of the pattern or after a
(*...) option that sets the newline convention or a UTF or UCP mode.
(*BSR_ANYCRLF) CR, LF, or CRLF
(*BSR_UNICODE) any Unicode newline sequence
CALLOUTS CALLOUTS
(?C) callout (?C) callout
@ -8016,8 +8051,8 @@ AUTHOR
REVISION REVISION
Last updated: 12 November 2013 Last updated: 08 January 2014
Copyright (c) 1997-2013 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View file

@ -5,7 +5,7 @@
/* This is the public header file for the PCRE library, to be #included by /* This is the public header file for the PCRE library, to be #included by
applications that call the PCRE functions. applications that call the PCRE functions.
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
/* The current PCRE version information. */ /* The current PCRE version information. */
#define PCRE_MAJOR 8 #define PCRE_MAJOR 8
#define PCRE_MINOR 34 #define PCRE_MINOR 36
#define PCRE_PRERELEASE #define PCRE_PRERELEASE
#define PCRE_DATE 2013-12-15 #define PCRE_DATE 2014-09-26
/* When an application links to a PCRE DLL in Windows, the symbols that are /* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE, the appropriate imported have to be identified as such. When building PCRE, the appropriate
@ -491,36 +491,42 @@ PCRE_EXP_DECL void (*pcre_free)(void *);
PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t); PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre_stack_free)(void *); PCRE_EXP_DECL void (*pcre_stack_free)(void *);
PCRE_EXP_DECL int (*pcre_callout)(pcre_callout_block *); PCRE_EXP_DECL int (*pcre_callout)(pcre_callout_block *);
PCRE_EXP_DECL int (*pcre_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre16_malloc)(size_t); PCRE_EXP_DECL void *(*pcre16_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_free)(void *); PCRE_EXP_DECL void (*pcre16_free)(void *);
PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t); PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre16_stack_free)(void *); PCRE_EXP_DECL void (*pcre16_stack_free)(void *);
PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *); PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *);
PCRE_EXP_DECL int (*pcre16_stack_guard)(void);
PCRE_EXP_DECL void *(*pcre32_malloc)(size_t); PCRE_EXP_DECL void *(*pcre32_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_free)(void *); PCRE_EXP_DECL void (*pcre32_free)(void *);
PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t); PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t);
PCRE_EXP_DECL void (*pcre32_stack_free)(void *); PCRE_EXP_DECL void (*pcre32_stack_free)(void *);
PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *); PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *);
PCRE_EXP_DECL int (*pcre32_stack_guard)(void);
#else /* VPCOMPAT */ #else /* VPCOMPAT */
PCRE_EXP_DECL void *pcre_malloc(size_t); PCRE_EXP_DECL void *pcre_malloc(size_t);
PCRE_EXP_DECL void pcre_free(void *); PCRE_EXP_DECL void pcre_free(void *);
PCRE_EXP_DECL void *pcre_stack_malloc(size_t); PCRE_EXP_DECL void *pcre_stack_malloc(size_t);
PCRE_EXP_DECL void pcre_stack_free(void *); PCRE_EXP_DECL void pcre_stack_free(void *);
PCRE_EXP_DECL int pcre_callout(pcre_callout_block *); PCRE_EXP_DECL int pcre_callout(pcre_callout_block *);
PCRE_EXP_DECL int pcre_stack_guard(void);
PCRE_EXP_DECL void *pcre16_malloc(size_t); PCRE_EXP_DECL void *pcre16_malloc(size_t);
PCRE_EXP_DECL void pcre16_free(void *); PCRE_EXP_DECL void pcre16_free(void *);
PCRE_EXP_DECL void *pcre16_stack_malloc(size_t); PCRE_EXP_DECL void *pcre16_stack_malloc(size_t);
PCRE_EXP_DECL void pcre16_stack_free(void *); PCRE_EXP_DECL void pcre16_stack_free(void *);
PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *); PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *);
PCRE_EXP_DECL int pcre16_stack_guard(void);
PCRE_EXP_DECL void *pcre32_malloc(size_t); PCRE_EXP_DECL void *pcre32_malloc(size_t);
PCRE_EXP_DECL void pcre32_free(void *); PCRE_EXP_DECL void pcre32_free(void *);
PCRE_EXP_DECL void *pcre32_stack_malloc(size_t); PCRE_EXP_DECL void *pcre32_stack_malloc(size_t);
PCRE_EXP_DECL void pcre32_stack_free(void *); PCRE_EXP_DECL void pcre32_stack_free(void *);
PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *); PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *);
PCRE_EXP_DECL int pcre32_stack_guard(void);
#endif /* VPCOMPAT */ #endif /* VPCOMPAT */
/* User defined callback which provides a stack just before the match starts. */ /* User defined callback which provides a stack just before the match starts. */

View file

@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -47,8 +47,8 @@ supporting internal functions that are not used by other modules. */
#endif #endif
#define NLBLOCK cd /* Block containing newline information */ #define NLBLOCK cd /* Block containing newline information */
#define PSSTART start_pattern /* Field containing processed string start */ #define PSSTART start_pattern /* Field containing pattern start */
#define PSEND end_pattern /* Field containing processed string end */ #define PSEND end_pattern /* Field containing pattern end */
#include "pcre_internal.h" #include "pcre_internal.h"
@ -547,6 +547,9 @@ static const char error_texts[] =
"parentheses are too deeply nested\0" "parentheses are too deeply nested\0"
"invalid range in character class\0" "invalid range in character class\0"
"group name must start with a non-digit\0" "group name must start with a non-digit\0"
/* 85 */
"parentheses are too deeply nested (stack check)\0"
"digits missing in \\x{} or \\o{}\0"
; ;
/* Table to identify digits and hex digits. This is used when compiling /* Table to identify digits and hex digits. This is used when compiling
@ -1257,6 +1260,7 @@ else
case CHAR_o: case CHAR_o:
if (ptr[1] != CHAR_LEFT_CURLY_BRACKET) *errorcodeptr = ERR81; else if (ptr[1] != CHAR_LEFT_CURLY_BRACKET) *errorcodeptr = ERR81; else
if (ptr[2] == CHAR_RIGHT_CURLY_BRACKET) *errorcodeptr = ERR86; else
{ {
ptr += 2; ptr += 2;
c = 0; c = 0;
@ -1326,6 +1330,11 @@ else
if (ptr[1] == CHAR_LEFT_CURLY_BRACKET) if (ptr[1] == CHAR_LEFT_CURLY_BRACKET)
{ {
ptr += 2; ptr += 2;
if (*ptr == CHAR_RIGHT_CURLY_BRACKET)
{
*errorcodeptr = ERR86;
break;
}
c = 0; c = 0;
overflow = FALSE; overflow = FALSE;
while (MAX_255(*ptr) && (digitab[*ptr] & ctype_xdigit) != 0) while (MAX_255(*ptr) && (digitab[*ptr] & ctype_xdigit) != 0)
@ -1581,30 +1590,30 @@ read_repeat_counts(const pcre_uchar *p, int *minp, int *maxp, int *errorcodeptr)
int min = 0; int min = 0;
int max = -1; int max = -1;
/* Read the minimum value and do a paranoid check: a negative value indicates while (IS_DIGIT(*p))
an integer overflow. */ {
min = min * 10 + (int)(*p++ - CHAR_0);
while (IS_DIGIT(*p)) min = min * 10 + (int)(*p++ - CHAR_0); if (min > 65535)
if (min < 0 || min > 65535)
{ {
*errorcodeptr = ERR5; *errorcodeptr = ERR5;
return p; return p;
} }
}
/* Read the maximum value if there is one, and again do a paranoid on its size.
Also, max must not be less than min. */
if (*p == CHAR_RIGHT_CURLY_BRACKET) max = min; else if (*p == CHAR_RIGHT_CURLY_BRACKET) max = min; else
{ {
if (*(++p) != CHAR_RIGHT_CURLY_BRACKET) if (*(++p) != CHAR_RIGHT_CURLY_BRACKET)
{ {
max = 0; max = 0;
while(IS_DIGIT(*p)) max = max * 10 + (int)(*p++ - CHAR_0); while(IS_DIGIT(*p))
if (max < 0 || max > 65535) {
max = max * 10 + (int)(*p++ - CHAR_0);
if (max > 65535)
{ {
*errorcodeptr = ERR5; *errorcodeptr = ERR5;
return p; return p;
} }
}
if (max < min) if (max < min)
{ {
*errorcodeptr = ERR4; *errorcodeptr = ERR4;
@ -1613,9 +1622,6 @@ if (*p == CHAR_RIGHT_CURLY_BRACKET) max = min; else
} }
} }
/* Fill in the required variables, and pass back the pointer to the terminating
'}'. */
*minp = min; *minp = min;
*maxp = max; *maxp = max;
return p; return p;
@ -2368,6 +2374,7 @@ for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE);
if (c == OP_RECURSE) if (c == OP_RECURSE)
{ {
const pcre_uchar *scode = cd->start_code + GET(code, 1); const pcre_uchar *scode = cd->start_code + GET(code, 1);
const pcre_uchar *endgroup = scode;
BOOL empty_branch; BOOL empty_branch;
/* Test for forward reference or uncompleted reference. This is disabled /* Test for forward reference or uncompleted reference. This is disabled
@ -2382,20 +2389,16 @@ for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE);
if (GET(scode, 1) == 0) return TRUE; /* Unclosed */ if (GET(scode, 1) == 0) return TRUE; /* Unclosed */
} }
/* If we are scanning a completed pattern, there are no forward references /* If the reference is to a completed group, we need to detect whether this
and all groups are complete. We need to detect whether this is a recursive is a recursive call, as otherwise there will be an infinite loop. If it is
call, as otherwise there will be an infinite loop. If it is a recursion, a recursion, just skip over it. Simple recursions are easily detected. For
just skip over it. Simple recursions are easily detected. For mutual mutual recursions we keep a chain on the stack. */
recursions we keep a chain on the stack. */
else
{
recurse_check *r = recurses;
const pcre_uchar *endgroup = scode;
do endgroup += GET(endgroup, 1); while (*endgroup == OP_ALT); do endgroup += GET(endgroup, 1); while (*endgroup == OP_ALT);
if (code >= scode && code <= endgroup) continue; /* Simple recursion */ if (code >= scode && code <= endgroup) continue; /* Simple recursion */
else
{
recurse_check *r = recurses;
for (r = recurses; r != NULL; r = r->prev) for (r = recurses; r != NULL; r = r->prev)
if (r->group == scode) break; if (r->group == scode) break;
if (r != NULL) continue; /* Mutual recursion */ if (r != NULL) continue; /* Mutual recursion */
@ -3036,7 +3039,7 @@ switch(c)
end += 1 + 2 * IMM2_SIZE; end += 1 + 2 * IMM2_SIZE;
break; break;
} }
list[2] = end - code; list[2] = (pcre_uint32)(end - code);
return end; return end;
} }
return NULL; /* Opcode not accepted */ return NULL; /* Opcode not accepted */
@ -3070,10 +3073,14 @@ const pcre_uint32 *chr_ptr;
const pcre_uint32 *ochr_ptr; const pcre_uint32 *ochr_ptr;
const pcre_uint32 *list_ptr; const pcre_uint32 *list_ptr;
const pcre_uchar *next_code; const pcre_uchar *next_code;
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
const pcre_uchar *xclass_flags;
#endif
const pcre_uint8 *class_bitset; const pcre_uint8 *class_bitset;
const pcre_uint8 *set1, *set2, *set_end; const pcre_uint8 *set1, *set2, *set_end;
pcre_uint32 chr; pcre_uint32 chr;
BOOL accepted, invert_bits; BOOL accepted, invert_bits;
BOOL entered_a_group = FALSE;
/* Note: the base_list[1] contains whether the current opcode has greedy /* Note: the base_list[1] contains whether the current opcode has greedy
(represented by a non-zero value) quantifier. This is a different from (represented by a non-zero value) quantifier. This is a different from
@ -3127,8 +3134,10 @@ for(;;)
case OP_ONCE: case OP_ONCE:
case OP_ONCE_NC: case OP_ONCE_NC:
/* Atomic sub-patterns and assertions can always auto-possessify their /* Atomic sub-patterns and assertions can always auto-possessify their
last iterator. */ last iterator. However, if the group was entered as a result of checking
return TRUE; a previous iterator, this is not possible. */
return !entered_a_group;
} }
code += PRIV(OP_lengths)[c]; code += PRIV(OP_lengths)[c];
@ -3147,6 +3156,8 @@ for(;;)
code = next_code + 1 + LINK_SIZE; code = next_code + 1 + LINK_SIZE;
next_code += GET(next_code, 1); next_code += GET(next_code, 1);
} }
entered_a_group = TRUE;
continue; continue;
case OP_BRAZERO: case OP_BRAZERO:
@ -3166,6 +3177,9 @@ for(;;)
code += PRIV(OP_lengths)[c]; code += PRIV(OP_lengths)[c];
continue; continue;
default:
break;
} }
/* Check for a supported opcode, and load its properties. */ /* Check for a supported opcode, and load its properties. */
@ -3220,6 +3234,21 @@ for(;;)
((list_ptr == list ? code : base_end) - list_ptr[2]); ((list_ptr == list ? code : base_end) - list_ptr[2]);
break; break;
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
case OP_XCLASS:
xclass_flags = (list_ptr == list ? code : base_end) - list_ptr[2] + LINK_SIZE;
if ((*xclass_flags & XCL_HASPROP) != 0) return FALSE;
if ((*xclass_flags & XCL_MAP) == 0)
{
/* No bits are set for characters < 256. */
if (list[1] == 0) return TRUE;
/* Might be an empty repeat. */
continue;
}
set2 = (pcre_uint8 *)(xclass_flags + 1);
break;
#endif
case OP_NOT_DIGIT: case OP_NOT_DIGIT:
invert_bits = TRUE; invert_bits = TRUE;
/* Fall through */ /* Fall through */
@ -3389,8 +3418,7 @@ for(;;)
rightop >= FIRST_AUTOTAB_OP && rightop <= LAST_AUTOTAB_RIGHT_OP && rightop >= FIRST_AUTOTAB_OP && rightop <= LAST_AUTOTAB_RIGHT_OP &&
autoposstab[leftop - FIRST_AUTOTAB_OP][rightop - FIRST_AUTOTAB_OP]; autoposstab[leftop - FIRST_AUTOTAB_OP][rightop - FIRST_AUTOTAB_OP];
if (!accepted) if (!accepted) return FALSE;
return FALSE;
if (list[1] == 0) return TRUE; if (list[1] == 0) return TRUE;
/* Might be an empty repeat. */ /* Might be an empty repeat. */
@ -3548,7 +3576,9 @@ for(;;)
if (list[1] == 0) return TRUE; if (list[1] == 0) return TRUE;
} }
return FALSE; /* Control never reaches here. There used to be a fail-save return FALSE; here,
but some compilers complain about an unreachable statement. */
} }
@ -4059,12 +4089,16 @@ for (c = *cptr; c <= d; c++)
if (c > d) return -1; /* Reached end of range */ if (c > d) return -1; /* Reached end of range */
/* Found a character that has a single other case. Search for the end of the
range, which is either the end of the input range, or a character that has zero
or more than one other cases. */
*ocptr = othercase; *ocptr = othercase;
next = othercase + 1; next = othercase + 1;
for (++c; c <= d; c++) for (++c; c <= d; c++)
{ {
if (UCD_OTHERCASE(c) != next) break; if ((co = UCD_CASESET(c)) != 0 || UCD_OTHERCASE(c) != next) break;
next++; next++;
} }
@ -4102,6 +4136,7 @@ add_to_class(pcre_uint8 *classbits, pcre_uchar **uchardptr, int options,
compile_data *cd, pcre_uint32 start, pcre_uint32 end) compile_data *cd, pcre_uint32 start, pcre_uint32 end)
{ {
pcre_uint32 c; pcre_uint32 c;
pcre_uint32 classbits_end = (end <= 0xff ? end : 0xff);
int n8 = 0; int n8 = 0;
/* If caseless matching is required, scan the range and process alternate /* If caseless matching is required, scan the range and process alternate
@ -4145,7 +4180,7 @@ if ((options & PCRE_CASELESS) != 0)
/* Not UTF-mode, or no UCP */ /* Not UTF-mode, or no UCP */
for (c = start; c <= end && c < 256; c++) for (c = start; c <= classbits_end; c++)
{ {
SETBIT(classbits, cd->fcc[c]); SETBIT(classbits, cd->fcc[c]);
n8++; n8++;
@ -4170,22 +4205,21 @@ in all cases. */
#endif /* COMPILE_PCRE[8|16] */ #endif /* COMPILE_PCRE[8|16] */
/* If all characters are less than 256, use the bit map. Otherwise use extra /* Use the bitmap for characters < 256. Otherwise use extra data.*/
data. */
if (end < 0x100) for (c = start; c <= classbits_end; c++)
{ {
for (c = start; c <= end; c++) /* Regardless of start, c will always be <= 255. */
{
n8++;
SETBIT(classbits, c); SETBIT(classbits, c);
} n8++;
} }
else #if defined SUPPORT_UTF || !defined COMPILE_PCRE8
if (start <= 0xff) start = 0xff + 1;
if (end >= start)
{ {
pcre_uchar *uchardata = *uchardptr; pcre_uchar *uchardata = *uchardptr;
#ifdef SUPPORT_UTF #ifdef SUPPORT_UTF
if ((options & PCRE_UTF8) != 0) /* All UTFs use the same flag bit */ if ((options & PCRE_UTF8) != 0) /* All UTFs use the same flag bit */
{ {
@ -4225,6 +4259,7 @@ else
*uchardptr = uchardata; /* Updata extra data pointer */ *uchardptr = uchardata; /* Updata extra data pointer */
} }
#endif /* SUPPORT_UTF || !COMPILE_PCRE8 */
return n8; /* Number of 8-bit characters */ return n8; /* Number of 8-bit characters */
} }
@ -4446,6 +4481,9 @@ for (;; ptr++)
BOOL reset_bracount; BOOL reset_bracount;
int class_has_8bitchar; int class_has_8bitchar;
int class_one_char; int class_one_char;
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
BOOL xclass_has_prop;
#endif
int newoptions; int newoptions;
int recno; int recno;
int refsign; int refsign;
@ -4653,7 +4691,8 @@ for (;; ptr++)
previous = NULL; previous = NULL;
if ((options & PCRE_MULTILINE) != 0) if ((options & PCRE_MULTILINE) != 0)
{ {
if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE; if (firstcharflags == REQ_UNSET)
zerofirstcharflags = firstcharflags = REQ_NONE;
*code++ = OP_CIRCM; *code++ = OP_CIRCM;
} }
else *code++ = OP_CIRC; else *code++ = OP_CIRC;
@ -4780,13 +4819,26 @@ for (;; ptr++)
should_flip_negation = FALSE; should_flip_negation = FALSE;
/* Extended class (xclass) will be used when characters > 255
might match. */
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
xclass = FALSE;
class_uchardata = code + LINK_SIZE + 2; /* For XCLASS items */
class_uchardata_base = class_uchardata; /* Save the start */
#endif
/* For optimization purposes, we track some properties of the class: /* For optimization purposes, we track some properties of the class:
class_has_8bitchar will be non-zero if the class contains at least one < class_has_8bitchar will be non-zero if the class contains at least one <
256 character; class_one_char will be 1 if the class contains just one 256 character; class_one_char will be 1 if the class contains just one
character. */ character; xclass_has_prop will be TRUE if unicode property checks
are present in the class. */
class_has_8bitchar = 0; class_has_8bitchar = 0;
class_one_char = 0; class_one_char = 0;
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
xclass_has_prop = FALSE;
#endif
/* Initialize the 32-char bit map to all zeros. We build the map in a /* Initialize the 32-char bit map to all zeros. We build the map in a
temporary bit of memory, in case the class contains fewer than two temporary bit of memory, in case the class contains fewer than two
@ -4795,12 +4847,6 @@ for (;; ptr++)
memset(classbits, 0, 32 * sizeof(pcre_uint8)); memset(classbits, 0, 32 * sizeof(pcre_uint8));
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
xclass = FALSE;
class_uchardata = code + LINK_SIZE + 2; /* For XCLASS items */
class_uchardata_base = class_uchardata; /* Save the start */
#endif
/* Process characters until ] is reached. By writing this as a "do" it /* Process characters until ] is reached. By writing this as a "do" it
means that an initial ] is taken as a data character. At the start of the means that an initial ] is taken as a data character. At the start of the
loop, c contains the first byte of the character. */ loop, c contains the first byte of the character. */
@ -4826,7 +4872,7 @@ for (;; ptr++)
if (lengthptr != NULL && class_uchardata > class_uchardata_base) if (lengthptr != NULL && class_uchardata > class_uchardata_base)
{ {
xclass = TRUE; xclass = TRUE;
*lengthptr += class_uchardata - class_uchardata_base; *lengthptr += (int)(class_uchardata - class_uchardata_base);
class_uchardata = class_uchardata_base; class_uchardata = class_uchardata_base;
} }
#endif #endif
@ -4924,6 +4970,7 @@ for (;; ptr++)
*class_uchardata++ = local_negate? XCL_NOTPROP : XCL_PROP; *class_uchardata++ = local_negate? XCL_NOTPROP : XCL_PROP;
*class_uchardata++ = ptype; *class_uchardata++ = ptype;
*class_uchardata++ = 0; *class_uchardata++ = 0;
xclass_has_prop = TRUE;
ptr = tempptr + 1; ptr = tempptr + 1;
continue; continue;
@ -5106,6 +5153,7 @@ for (;; ptr++)
XCL_PROP : XCL_NOTPROP; XCL_PROP : XCL_NOTPROP;
*class_uchardata++ = ptype; *class_uchardata++ = ptype;
*class_uchardata++ = pdata; *class_uchardata++ = pdata;
xclass_has_prop = TRUE;
class_has_8bitchar--; /* Undo! */ class_has_8bitchar--; /* Undo! */
continue; continue;
} }
@ -5274,7 +5322,7 @@ for (;; ptr++)
whatever repeat count may follow. In the case of reqchar, save the whatever repeat count may follow. In the case of reqchar, save the
previous value for reinstating. */ previous value for reinstating. */
if (class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET) if (!inescq && class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
{ {
ptr++; ptr++;
zeroreqchar = reqchar; zeroreqchar = reqchar;
@ -5400,6 +5448,7 @@ for (;; ptr++)
*code++ = OP_XCLASS; *code++ = OP_XCLASS;
code += LINK_SIZE; code += LINK_SIZE;
*code = negate_class? XCL_NOT:0; *code = negate_class? XCL_NOT:0;
if (xclass_has_prop) *code |= XCL_HASPROP;
/* If the map is required, move up the extra data to make room for it; /* If the map is required, move up the extra data to make room for it;
otherwise just move the code pointer to the end of the extra data. */ otherwise just move the code pointer to the end of the extra data. */
@ -5409,6 +5458,8 @@ for (;; ptr++)
*code++ |= XCL_MAP; *code++ |= XCL_MAP;
memmove(code + (32 / sizeof(pcre_uchar)), code, memmove(code + (32 / sizeof(pcre_uchar)), code,
IN_UCHARS(class_uchardata - code)); IN_UCHARS(class_uchardata - code));
if (negate_class && !xclass_has_prop)
for (c = 0; c < 32; c++) classbits[c] = ~classbits[c];
memcpy(code, classbits, 32); memcpy(code, classbits, 32);
code = class_uchardata + (32 / sizeof(pcre_uchar)); code = class_uchardata + (32 / sizeof(pcre_uchar));
} }
@ -5966,8 +6017,8 @@ for (;; ptr++)
while (cd->hwm > cd->start_workspace + cd->workspace_size - while (cd->hwm > cd->start_workspace + cd->workspace_size -
WORK_SIZE_SAFETY_MARGIN - (this_hwm - save_hwm)) WORK_SIZE_SAFETY_MARGIN - (this_hwm - save_hwm))
{ {
int save_offset = save_hwm - cd->start_workspace; size_t save_offset = save_hwm - cd->start_workspace;
int this_offset = this_hwm - cd->start_workspace; size_t this_offset = this_hwm - cd->start_workspace;
*errorcodeptr = expand_workspace(cd); *errorcodeptr = expand_workspace(cd);
if (*errorcodeptr != 0) goto FAILED; if (*errorcodeptr != 0) goto FAILED;
save_hwm = (pcre_uchar *)cd->start_workspace + save_offset; save_hwm = (pcre_uchar *)cd->start_workspace + save_offset;
@ -6048,8 +6099,8 @@ for (;; ptr++)
while (cd->hwm > cd->start_workspace + cd->workspace_size - while (cd->hwm > cd->start_workspace + cd->workspace_size -
WORK_SIZE_SAFETY_MARGIN - (this_hwm - save_hwm)) WORK_SIZE_SAFETY_MARGIN - (this_hwm - save_hwm))
{ {
int save_offset = save_hwm - cd->start_workspace; size_t save_offset = save_hwm - cd->start_workspace;
int this_offset = this_hwm - cd->start_workspace; size_t this_offset = this_hwm - cd->start_workspace;
*errorcodeptr = expand_workspace(cd); *errorcodeptr = expand_workspace(cd);
if (*errorcodeptr != 0) goto FAILED; if (*errorcodeptr != 0) goto FAILED;
save_hwm = (pcre_uchar *)cd->start_workspace + save_offset; save_hwm = (pcre_uchar *)cd->start_workspace + save_offset;
@ -6577,7 +6628,10 @@ for (;; ptr++)
code[1+LINK_SIZE] = OP_CREF; code[1+LINK_SIZE] = OP_CREF;
skipbytes = 1+IMM2_SIZE; skipbytes = 1+IMM2_SIZE;
refsign = -1; refsign = -1; /* => not a number */
namelen = -1; /* => not a name; must set to avoid warning */
name = NULL; /* Always set to avoid warning */
recno = 0; /* Always set to avoid warning */
/* Check for a test for recursion in a named group. */ /* Check for a test for recursion in a named group. */
@ -6614,7 +6668,6 @@ for (;; ptr++)
if (refsign >= 0) if (refsign >= 0)
{ {
recno = 0;
while (IS_DIGIT(*ptr)) while (IS_DIGIT(*ptr))
{ {
recno = recno * 10 + (int)(*ptr - CHAR_0); recno = recno * 10 + (int)(*ptr - CHAR_0);
@ -6645,7 +6698,8 @@ for (;; ptr++)
ptr++; ptr++;
} }
namelen = (int)(ptr - name); namelen = (int)(ptr - name);
if (lengthptr != NULL) *lengthptr += IMM2_SIZE; if (lengthptr != NULL && (options & PCRE_DUPNAMES) != 0)
*lengthptr += IMM2_SIZE;
} }
/* Check the terminator */ /* Check the terminator */
@ -6706,9 +6760,11 @@ for (;; ptr++)
for (; i < cd->names_found; i++) for (; i < cd->names_found; i++)
{ {
slot += cd->name_entry_size; slot += cd->name_entry_size;
if (STRNCMP_UC_UC(name, slot+IMM2_SIZE, namelen) != 0) break; if (STRNCMP_UC_UC(name, slot+IMM2_SIZE, namelen) != 0 ||
(slot+IMM2_SIZE)[namelen] != 0) break;
count++; count++;
} }
if (count > 1) if (count > 1)
{ {
PUT2(code, 2+LINK_SIZE, offset); PUT2(code, 2+LINK_SIZE, offset);
@ -7057,6 +7113,12 @@ for (;; ptr++)
/* Count named back references. */ /* Count named back references. */
if (!is_recurse) cd->namedrefcount++; if (!is_recurse) cd->namedrefcount++;
/* If duplicate names are permitted, we have to allow for a named
reference to a duplicated name (this cannot be determined until the
second pass). This needs an extra 16-bit data item. */
if ((options & PCRE_DUPNAMES) != 0) *lengthptr += IMM2_SIZE;
} }
/* In the real compile, search the name table. We check the name /* In the real compile, search the name table. We check the name
@ -7103,6 +7165,8 @@ for (;; ptr++)
for (i++; i < cd->names_found; i++) for (i++; i < cd->names_found; i++)
{ {
if (STRCMP_UC_UC(slot + IMM2_SIZE, cslot + IMM2_SIZE) != 0) break; if (STRCMP_UC_UC(slot + IMM2_SIZE, cslot + IMM2_SIZE) != 0) break;
count++; count++;
cslot += cd->name_entry_size; cslot += cd->name_entry_size;
} }
@ -7991,6 +8055,16 @@ unsigned int orig_bracount;
unsigned int max_bracount; unsigned int max_bracount;
branch_chain bc; branch_chain bc;
/* If set, call the external function that checks for stack availability. */
if (PUBL(stack_guard) != NULL && PUBL(stack_guard)())
{
*errorcodeptr= ERR85;
return FALSE;
}
/* Miscellaneous initialization */
bc.outer = bcptr; bc.outer = bcptr;
bc.current_branch = code; bc.current_branch = code;
@ -8190,12 +8264,16 @@ for (;;)
/* If it was a capturing subpattern, check to see if it contained any /* If it was a capturing subpattern, check to see if it contained any
recursive back references. If so, we must wrap it in atomic brackets. recursive back references. If so, we must wrap it in atomic brackets.
In any event, remove the block from the chain. */ Because we are moving code along, we must ensure that any pending recursive
references are updated. In any event, remove the block from the chain. */
if (capnumber > 0) if (capnumber > 0)
{ {
if (cd->open_caps->flag) if (cd->open_caps->flag)
{ {
*code = OP_END;
adjust_recurse(start_bracket, 1 + LINK_SIZE,
(options & PCRE_UTF8) != 0, cd, cd->hwm);
memmove(start_bracket + 1 + LINK_SIZE, start_bracket, memmove(start_bracket + 1 + LINK_SIZE, start_bracket,
IN_UCHARS(code - start_bracket)); IN_UCHARS(code - start_bracket));
*start_bracket = OP_ONCE; *start_bracket = OP_ONCE;
@ -9200,11 +9278,18 @@ subpattern. */
if (errorcode == 0 && re->top_backref > re->top_bracket) errorcode = ERR15; if (errorcode == 0 && re->top_backref > re->top_bracket) errorcode = ERR15;
/* Unless disabled, check whether single character iterators can be /* Unless disabled, check whether any single character iterators can be
auto-possessified. The function overwrites the appropriate opcode values. */ auto-possessified. The function overwrites the appropriate opcode values, so
the type of the pointer must be cast. NOTE: the intermediate variable "temp" is
used in this code because at least one compiler gives a warning about loss of
"const" attribute if the cast (pcre_uchar *)codestart is used directly in the
function call. */
if ((options & PCRE_NO_AUTO_POSSESS) == 0) if ((options & PCRE_NO_AUTO_POSSESS) == 0)
auto_possessify((pcre_uchar *)codestart, utf, cd); {
pcre_uchar *temp = (pcre_uchar *)codestart;
auto_possessify(temp, utf, cd);
}
/* If there were any lookbehind assertions that contained OP_RECURSE /* If there were any lookbehind assertions that contained OP_RECURSE
(recursions or subroutine calls), a flag is set for them to be checked here, (recursions or subroutine calls), a flag is set for them to be checked here,

View file

@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -134,7 +134,7 @@ pcre_uint32 c;
BOOL utf = md->utf; BOOL utf = md->utf;
if (is_subject && length > md->end_subject - p) length = md->end_subject - p; if (is_subject && length > md->end_subject - p) length = md->end_subject - p;
while (length-- > 0) while (length-- > 0)
if (isprint(c = RAWUCHARINCTEST(p))) printf("%c", (char)c); else printf("\\x{%02x}", c); if (isprint(c = UCHAR21INCTEST(p))) printf("%c", (char)c); else printf("\\x{%02x}", c);
} }
#endif #endif
@ -237,8 +237,8 @@ if (caseless)
{ {
pcre_uint32 cc, cp; pcre_uint32 cc, cp;
if (eptr >= md->end_subject) return -2; /* Partial match */ if (eptr >= md->end_subject) return -2; /* Partial match */
cc = RAWUCHARTEST(eptr); cc = UCHAR21TEST(eptr);
cp = RAWUCHARTEST(p); cp = UCHAR21TEST(p);
if (TABLE_GET(cp, md->lcc, cp) != TABLE_GET(cc, md->lcc, cc)) return -1; if (TABLE_GET(cp, md->lcc, cp) != TABLE_GET(cc, md->lcc, cc)) return -1;
p++; p++;
eptr++; eptr++;
@ -254,7 +254,7 @@ else
while (length-- > 0) while (length-- > 0)
{ {
if (eptr >= md->end_subject) return -2; /* Partial match */ if (eptr >= md->end_subject) return -2; /* Partial match */
if (RAWUCHARINCTEST(p) != RAWUCHARINCTEST(eptr)) return -1; if (UCHAR21INCTEST(p) != UCHAR21INCTEST(eptr)) return -1;
} }
} }
@ -1167,11 +1167,16 @@ for (;;)
if (rrc == MATCH_KETRPOS) if (rrc == MATCH_KETRPOS)
{ {
offset_top = md->end_offset_top; offset_top = md->end_offset_top;
eptr = md->end_match_ptr;
ecode = md->start_code + code_offset; ecode = md->start_code + code_offset;
save_capture_last = md->capture_last; save_capture_last = md->capture_last;
matched_once = TRUE; matched_once = TRUE;
mstart = md->start_match_ptr; /* In case \K changed it */ mstart = md->start_match_ptr; /* In case \K changed it */
if (eptr == md->end_match_ptr) /* Matched an empty string */
{
do ecode += GET(ecode, 1); while (*ecode == OP_ALT);
break;
}
eptr = md->end_match_ptr;
continue; continue;
} }
@ -1241,10 +1246,15 @@ for (;;)
if (rrc == MATCH_KETRPOS) if (rrc == MATCH_KETRPOS)
{ {
offset_top = md->end_offset_top; offset_top = md->end_offset_top;
eptr = md->end_match_ptr;
ecode = md->start_code + code_offset; ecode = md->start_code + code_offset;
matched_once = TRUE; matched_once = TRUE;
mstart = md->start_match_ptr; /* In case \K reset it */ mstart = md->start_match_ptr; /* In case \K reset it */
if (eptr == md->end_match_ptr) /* Matched an empty string */
{
do ecode += GET(ecode, 1); while (*ecode == OP_ALT);
break;
}
eptr = md->end_match_ptr;
continue; continue;
} }
@ -1979,6 +1989,19 @@ for (;;)
} }
} }
/* OP_KETRPOS is a possessive repeating ket. Remember the current position,
and return the MATCH_KETRPOS. This makes it possible to do the repeats one
at a time from the outer level, thus saving stack. This must precede the
empty string test - in this case that test is done at the outer level. */
if (*ecode == OP_KETRPOS)
{
md->start_match_ptr = mstart; /* In case \K reset it */
md->end_match_ptr = eptr;
md->end_offset_top = offset_top;
RRETURN(MATCH_KETRPOS);
}
/* For an ordinary non-repeating ket, just continue at this level. This /* For an ordinary non-repeating ket, just continue at this level. This
also happens for a repeating ket if no characters were matched in the also happens for a repeating ket if no characters were matched in the
group. This is the forcible breaking of infinite loops as implemented in group. This is the forcible breaking of infinite loops as implemented in
@ -2001,18 +2024,6 @@ for (;;)
break; break;
} }
/* OP_KETRPOS is a possessive repeating ket. Remember the current position,
and return the MATCH_KETRPOS. This makes it possible to do the repeats one
at a time from the outer level, thus saving stack. */
if (*ecode == OP_KETRPOS)
{
md->start_match_ptr = mstart; /* In case \K reset it */
md->end_match_ptr = eptr;
md->end_offset_top = offset_top;
RRETURN(MATCH_KETRPOS);
}
/* The normal repeating kets try the rest of the pattern or restart from /* The normal repeating kets try the rest of the pattern or restart from
the preceding bracket, in the appropriate order. In the second case, we can the preceding bracket, in the appropriate order. In the second case, we can
use tail recursion to avoid using another stack frame, unless we have an use tail recursion to avoid using another stack frame, unless we have an
@ -2103,7 +2114,7 @@ for (;;)
eptr + 1 >= md->end_subject && eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 && NLBLOCK->nllen == 2 &&
RAWUCHARTEST(eptr) == NLBLOCK->nl[0]) UCHAR21TEST(eptr) == NLBLOCK->nl[0])
{ {
md->hitend = TRUE; md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
@ -2147,7 +2158,7 @@ for (;;)
eptr + 1 >= md->end_subject && eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 && NLBLOCK->nllen == 2 &&
RAWUCHARTEST(eptr) == NLBLOCK->nl[0]) UCHAR21TEST(eptr) == NLBLOCK->nl[0])
{ {
md->hitend = TRUE; md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
@ -2290,7 +2301,7 @@ for (;;)
eptr + 1 >= md->end_subject && eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 && NLBLOCK->nllen == 2 &&
RAWUCHARTEST(eptr) == NLBLOCK->nl[0]) UCHAR21TEST(eptr) == NLBLOCK->nl[0])
{ {
md->hitend = TRUE; md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
@ -2444,7 +2455,7 @@ for (;;)
{ {
SCHECK_PARTIAL(); SCHECK_PARTIAL();
} }
else if (RAWUCHARTEST(eptr) == CHAR_LF) eptr++; else if (UCHAR21TEST(eptr) == CHAR_LF) eptr++;
break; break;
case CHAR_LF: case CHAR_LF:
@ -2691,16 +2702,22 @@ for (;;)
pcre_uchar *slot = md->name_table + GET2(ecode, 1) * md->name_entry_size; pcre_uchar *slot = md->name_table + GET2(ecode, 1) * md->name_entry_size;
ecode += 1 + 2*IMM2_SIZE; ecode += 1 + 2*IMM2_SIZE;
/* Setting the default length first and initializing 'offset' avoids
compiler warnings in the REF_REPEAT code. */
length = (md->jscript_compat)? 0 : -1;
offset = 0;
while (count-- > 0) while (count-- > 0)
{ {
offset = GET2(slot, 0) << 1; offset = GET2(slot, 0) << 1;
if (offset < offset_top && md->offset_vector[offset] >= 0) break; if (offset < offset_top && md->offset_vector[offset] >= 0)
{
length = md->offset_vector[offset+1] - md->offset_vector[offset];
break;
}
slot += md->name_entry_size; slot += md->name_entry_size;
} }
if (count < 0)
length = (md->jscript_compat)? 0 : -1;
else
length = md->offset_vector[offset+1] - md->offset_vector[offset];
} }
goto REF_REPEAT; goto REF_REPEAT;
@ -3212,7 +3229,7 @@ for (;;)
CHECK_PARTIAL(); /* Not SCHECK_PARTIAL() */ CHECK_PARTIAL(); /* Not SCHECK_PARTIAL() */
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
while (length-- > 0) if (*ecode++ != RAWUCHARINC(eptr)) RRETURN(MATCH_NOMATCH); while (length-- > 0) if (*ecode++ != UCHAR21INC(eptr)) RRETURN(MATCH_NOMATCH);
} }
else else
#endif #endif
@ -3252,7 +3269,7 @@ for (;;)
if (fc < 128) if (fc < 128)
{ {
pcre_uint32 cc = RAWUCHAR(eptr); pcre_uint32 cc = UCHAR21(eptr);
if (md->lcc[fc] != TABLE_GET(cc, md->lcc, cc)) RRETURN(MATCH_NOMATCH); if (md->lcc[fc] != TABLE_GET(cc, md->lcc, cc)) RRETURN(MATCH_NOMATCH);
ecode++; ecode++;
eptr++; eptr++;
@ -3521,7 +3538,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
cc = RAWUCHARTEST(eptr); cc = UCHAR21TEST(eptr);
if (fc != cc && foc != cc) RRETURN(MATCH_NOMATCH); if (fc != cc && foc != cc) RRETURN(MATCH_NOMATCH);
eptr++; eptr++;
} }
@ -3539,7 +3556,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
cc = RAWUCHARTEST(eptr); cc = UCHAR21TEST(eptr);
if (fc != cc && foc != cc) RRETURN(MATCH_NOMATCH); if (fc != cc && foc != cc) RRETURN(MATCH_NOMATCH);
eptr++; eptr++;
} }
@ -3556,7 +3573,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
break; break;
} }
cc = RAWUCHARTEST(eptr); cc = UCHAR21TEST(eptr);
if (fc != cc && foc != cc) break; if (fc != cc && foc != cc) break;
eptr++; eptr++;
} }
@ -3583,7 +3600,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
if (fc != RAWUCHARINCTEST(eptr)) RRETURN(MATCH_NOMATCH); if (fc != UCHAR21INCTEST(eptr)) RRETURN(MATCH_NOMATCH);
} }
if (min == max) continue; if (min == max) continue;
@ -3600,7 +3617,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
if (fc != RAWUCHARINCTEST(eptr)) RRETURN(MATCH_NOMATCH); if (fc != UCHAR21INCTEST(eptr)) RRETURN(MATCH_NOMATCH);
} }
/* Control never gets here */ /* Control never gets here */
} }
@ -3614,7 +3631,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
break; break;
} }
if (fc != RAWUCHARTEST(eptr)) break; if (fc != UCHAR21TEST(eptr)) break;
eptr++; eptr++;
} }
if (possessive) continue; /* No backtracking */ if (possessive) continue; /* No backtracking */
@ -4369,7 +4386,7 @@ for (;;)
eptr + 1 >= md->end_subject && eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 && NLBLOCK->nllen == 2 &&
RAWUCHAR(eptr) == NLBLOCK->nl[0]) UCHAR21(eptr) == NLBLOCK->nl[0])
{ {
md->hitend = TRUE; md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
@ -4411,7 +4428,7 @@ for (;;)
default: RRETURN(MATCH_NOMATCH); default: RRETURN(MATCH_NOMATCH);
case CHAR_CR: case CHAR_CR:
if (eptr < md->end_subject && RAWUCHAR(eptr) == CHAR_LF) eptr++; if (eptr < md->end_subject && UCHAR21(eptr) == CHAR_LF) eptr++;
break; break;
case CHAR_LF: case CHAR_LF:
@ -4521,7 +4538,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
cc = RAWUCHAR(eptr); cc = UCHAR21(eptr);
if (cc >= 128 || (md->ctypes[cc] & ctype_digit) == 0) if (cc >= 128 || (md->ctypes[cc] & ctype_digit) == 0)
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
eptr++; eptr++;
@ -4538,7 +4555,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
cc = RAWUCHAR(eptr); cc = UCHAR21(eptr);
if (cc < 128 && (md->ctypes[cc] & ctype_space) != 0) if (cc < 128 && (md->ctypes[cc] & ctype_space) != 0)
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
eptr++; eptr++;
@ -4555,7 +4572,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
cc = RAWUCHAR(eptr); cc = UCHAR21(eptr);
if (cc >= 128 || (md->ctypes[cc] & ctype_space) == 0) if (cc >= 128 || (md->ctypes[cc] & ctype_space) == 0)
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
eptr++; eptr++;
@ -4572,7 +4589,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
cc = RAWUCHAR(eptr); cc = UCHAR21(eptr);
if (cc < 128 && (md->ctypes[cc] & ctype_word) != 0) if (cc < 128 && (md->ctypes[cc] & ctype_word) != 0)
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
eptr++; eptr++;
@ -4589,7 +4606,7 @@ for (;;)
SCHECK_PARTIAL(); SCHECK_PARTIAL();
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
} }
cc = RAWUCHAR(eptr); cc = UCHAR21(eptr);
if (cc >= 128 || (md->ctypes[cc] & ctype_word) == 0) if (cc >= 128 || (md->ctypes[cc] & ctype_word) == 0)
RRETURN(MATCH_NOMATCH); RRETURN(MATCH_NOMATCH);
eptr++; eptr++;
@ -5150,7 +5167,7 @@ for (;;)
{ {
default: RRETURN(MATCH_NOMATCH); default: RRETURN(MATCH_NOMATCH);
case CHAR_CR: case CHAR_CR:
if (eptr < md->end_subject && RAWUCHAR(eptr) == CHAR_LF) eptr++; if (eptr < md->end_subject && UCHAR21(eptr) == CHAR_LF) eptr++;
break; break;
case CHAR_LF: case CHAR_LF:
@ -5675,8 +5692,6 @@ for (;;)
switch(ctype) switch(ctype)
{ {
case OP_ANY: case OP_ANY:
if (max < INT_MAX)
{
for (i = min; i < max; i++) for (i = min; i < max; i++)
{ {
if (eptr >= md->end_subject) if (eptr >= md->end_subject)
@ -5689,7 +5704,7 @@ for (;;)
eptr + 1 >= md->end_subject && eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 && NLBLOCK->nllen == 2 &&
RAWUCHAR(eptr) == NLBLOCK->nl[0]) UCHAR21(eptr) == NLBLOCK->nl[0])
{ {
md->hitend = TRUE; md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
@ -5697,33 +5712,6 @@ for (;;)
eptr++; eptr++;
ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++); ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++);
} }
}
/* Handle unlimited UTF-8 repeat */
else
{
for (i = min; i < max; i++)
{
if (eptr >= md->end_subject)
{
SCHECK_PARTIAL();
break;
}
if (IS_NEWLINE(eptr)) break;
if (md->partial != 0 && /* Take care with CRLF partial */
eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 &&
RAWUCHAR(eptr) == NLBLOCK->nl[0])
{
md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
}
eptr++;
ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++);
}
}
break; break;
case OP_ALLANY: case OP_ALLANY:
@ -5772,7 +5760,7 @@ for (;;)
if (c == CHAR_CR) if (c == CHAR_CR)
{ {
if (++eptr >= md->end_subject) break; if (++eptr >= md->end_subject) break;
if (RAWUCHAR(eptr) == CHAR_LF) eptr++; if (UCHAR21(eptr) == CHAR_LF) eptr++;
} }
else else
{ {
@ -5935,8 +5923,8 @@ for (;;)
if (rrc != MATCH_NOMATCH) RRETURN(rrc); if (rrc != MATCH_NOMATCH) RRETURN(rrc);
eptr--; eptr--;
BACKCHAR(eptr); BACKCHAR(eptr);
if (ctype == OP_ANYNL && eptr > pp && RAWUCHAR(eptr) == CHAR_NL && if (ctype == OP_ANYNL && eptr > pp && UCHAR21(eptr) == CHAR_NL &&
RAWUCHAR(eptr - 1) == CHAR_CR) eptr--; UCHAR21(eptr - 1) == CHAR_CR) eptr--;
} }
} }
else else
@ -6513,7 +6501,7 @@ tables = re->tables;
if (extra_data != NULL) if (extra_data != NULL)
{ {
register unsigned int flags = extra_data->flags; unsigned long int flags = extra_data->flags;
if ((flags & PCRE_EXTRA_STUDY_DATA) != 0) if ((flags & PCRE_EXTRA_STUDY_DATA) != 0)
study = (const pcre_study_data *)extra_data->study_data; study = (const pcre_study_data *)extra_data->study_data;
if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0) if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0)
@ -6783,10 +6771,10 @@ for(;;)
if (first_char != first_char2) if (first_char != first_char2)
while (start_match < end_subject && while (start_match < end_subject &&
(smc = RAWUCHARTEST(start_match)) != first_char && smc != first_char2) (smc = UCHAR21TEST(start_match)) != first_char && smc != first_char2)
start_match++; start_match++;
else else
while (start_match < end_subject && RAWUCHARTEST(start_match) != first_char) while (start_match < end_subject && UCHAR21TEST(start_match) != first_char)
start_match++; start_match++;
} }
@ -6818,7 +6806,7 @@ for(;;)
if (start_match[-1] == CHAR_CR && if (start_match[-1] == CHAR_CR &&
(md->nltype == NLTYPE_ANY || md->nltype == NLTYPE_ANYCRLF) && (md->nltype == NLTYPE_ANY || md->nltype == NLTYPE_ANYCRLF) &&
start_match < end_subject && start_match < end_subject &&
RAWUCHARTEST(start_match) == CHAR_NL) UCHAR21TEST(start_match) == CHAR_NL)
start_match++; start_match++;
} }
} }
@ -6829,22 +6817,12 @@ for(;;)
{ {
while (start_match < end_subject) while (start_match < end_subject)
{ {
register pcre_uint32 c = RAWUCHARTEST(start_match); register pcre_uint32 c = UCHAR21TEST(start_match);
#ifndef COMPILE_PCRE8 #ifndef COMPILE_PCRE8
if (c > 255) c = 255; if (c > 255) c = 255;
#endif #endif
if ((start_bits[c/8] & (1 << (c&7))) == 0) if ((start_bits[c/8] & (1 << (c&7))) != 0) break;
{
start_match++; start_match++;
#if defined SUPPORT_UTF && defined COMPILE_PCRE8
/* In non 8-bit mode, the iteration will stop for
characters > 255 at the beginning or not stop at all. */
if (utf)
ACROSSCHAR(start_match < end_subject, *start_match,
start_match++);
#endif
}
else break;
} }
} }
} /* Starting optimizations */ } /* Starting optimizations */
@ -6897,7 +6875,7 @@ for(;;)
{ {
while (p < end_subject) while (p < end_subject)
{ {
register pcre_uint32 pp = RAWUCHARINCTEST(p); register pcre_uint32 pp = UCHAR21INCTEST(p);
if (pp == req_char || pp == req_char2) { p--; break; } if (pp == req_char || pp == req_char2) { p--; break; }
} }
} }
@ -6905,7 +6883,7 @@ for(;;)
{ {
while (p < end_subject) while (p < end_subject)
{ {
if (RAWUCHARINCTEST(p) == req_char) { p--; break; } if (UCHAR21INCTEST(p) == req_char) { p--; break; }
} }
} }

View file

@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -72,6 +72,7 @@ PCRE_EXP_DATA_DEFN void (*PUBL(free))(void *) = LocalPcreFree;
PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = LocalPcreMalloc; PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = LocalPcreMalloc;
PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = LocalPcreFree; PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = LocalPcreFree;
PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL; PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL;
PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL;
#elif !defined VPCOMPAT #elif !defined VPCOMPAT
PCRE_EXP_DATA_DEFN void *(*PUBL(malloc))(size_t) = malloc; PCRE_EXP_DATA_DEFN void *(*PUBL(malloc))(size_t) = malloc;
@ -79,6 +80,7 @@ PCRE_EXP_DATA_DEFN void (*PUBL(free))(void *) = free;
PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = malloc; PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = malloc;
PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = free; PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = free;
PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL; PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL;
PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL;
#endif #endif
/* End of pcre_globals.c */ /* End of pcre_globals.c */

View file

@ -7,7 +7,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2013 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -316,8 +316,8 @@ start/end of string field names are. */
&(NLBLOCK->nllen), utf)) \ &(NLBLOCK->nllen), utf)) \
: \ : \
((p) <= NLBLOCK->PSEND - NLBLOCK->nllen && \ ((p) <= NLBLOCK->PSEND - NLBLOCK->nllen && \
RAWUCHARTEST(p) == NLBLOCK->nl[0] && \ UCHAR21TEST(p) == NLBLOCK->nl[0] && \
(NLBLOCK->nllen == 1 || RAWUCHARTEST(p+1) == NLBLOCK->nl[1]) \ (NLBLOCK->nllen == 1 || UCHAR21TEST(p+1) == NLBLOCK->nl[1]) \
) \ ) \
) )
@ -330,8 +330,8 @@ start/end of string field names are. */
&(NLBLOCK->nllen), utf)) \ &(NLBLOCK->nllen), utf)) \
: \ : \
((p) >= NLBLOCK->PSSTART + NLBLOCK->nllen && \ ((p) >= NLBLOCK->PSSTART + NLBLOCK->nllen && \
RAWUCHARTEST(p - NLBLOCK->nllen) == NLBLOCK->nl[0] && \ UCHAR21TEST(p - NLBLOCK->nllen) == NLBLOCK->nl[0] && \
(NLBLOCK->nllen == 1 || RAWUCHARTEST(p - NLBLOCK->nllen + 1) == NLBLOCK->nl[1]) \ (NLBLOCK->nllen == 1 || UCHAR21TEST(p - NLBLOCK->nllen + 1) == NLBLOCK->nl[1]) \
) \ ) \
) )
@ -582,12 +582,27 @@ changed in future to be a fixed number of bytes or to depend on LINK_SIZE. */
#define MAX_MARK ((1u << 8) - 1) #define MAX_MARK ((1u << 8) - 1)
#endif #endif
/* There is a proposed future special "UTF-21" mode, in which only the lowest
21 bits of a 32-bit character are interpreted as UTF, with the remaining 11
high-order bits available to the application for other uses. In preparation for
the future implementation of this mode, there are macros that load a data item
and, if in this special mode, mask it to 21 bits. These macros all have names
starting with UCHAR21. In all other modes, including the normal 32-bit
library, the macros all have the same simple definitions. When the new mode is
implemented, it is expected that these definitions will be varied appropriately
using #ifdef when compiling the library that supports the special mode. */
#define UCHAR21(eptr) (*(eptr))
#define UCHAR21TEST(eptr) (*(eptr))
#define UCHAR21INC(eptr) (*(eptr)++)
#define UCHAR21INCTEST(eptr) (*(eptr)++)
/* When UTF encoding is being used, a character is no longer just a single /* When UTF encoding is being used, a character is no longer just a single
byte. The macros for character handling generate simple sequences when used in byte in 8-bit mode or a single short in 16-bit mode. The macros for character
character-mode, and more complicated ones for UTF characters. GETCHARLENTEST handling generate simple sequences when used in the basic mode, and more
and other macros are not used when UTF is not supported, so they are not complicated ones for UTF characters. GETCHARLENTEST and other macros are not
defined. To make sure they can never even appear when UTF support is omitted, used when UTF is not supported. To make sure they can never even appear when
we don't even define them. */ UTF support is omitted, we don't even define them. */
#ifndef SUPPORT_UTF #ifndef SUPPORT_UTF
@ -600,10 +615,6 @@ we don't even define them. */
#define GETCHARINC(c, eptr) c = *eptr++; #define GETCHARINC(c, eptr) c = *eptr++;
#define GETCHARINCTEST(c, eptr) c = *eptr++; #define GETCHARINCTEST(c, eptr) c = *eptr++;
#define GETCHARLEN(c, eptr, len) c = *eptr; #define GETCHARLEN(c, eptr, len) c = *eptr;
#define RAWUCHAR(eptr) (*(eptr))
#define RAWUCHARINC(eptr) (*(eptr)++)
#define RAWUCHARTEST(eptr) (*(eptr))
#define RAWUCHARINCTEST(eptr) (*(eptr)++)
/* #define GETCHARLENTEST(c, eptr, len) */ /* #define GETCHARLENTEST(c, eptr, len) */
/* #define BACKCHAR(eptr) */ /* #define BACKCHAR(eptr) */
/* #define FORWARDCHAR(eptr) */ /* #define FORWARDCHAR(eptr) */
@ -776,30 +787,6 @@ do not know if we are in UTF-8 mode. */
c = *eptr; \ c = *eptr; \
if (utf && c >= 0xc0) GETUTF8LEN(c, eptr, len); if (utf && c >= 0xc0) GETUTF8LEN(c, eptr, len);
/* Returns the next uchar, not advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHAR(eptr) \
(*(eptr))
/* Returns the next uchar, advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHARINC(eptr) \
(*((eptr)++))
/* Returns the next uchar, testing for UTF mode, and not advancing the
pointer. */
#define RAWUCHARTEST(eptr) \
(*(eptr))
/* Returns the next uchar, testing for UTF mode, advancing the
pointer. */
#define RAWUCHARINCTEST(eptr) \
(*((eptr)++))
/* If the pointer is not at the start of a character, move it back until /* If the pointer is not at the start of a character, move it back until
it is. This is called only in UTF-8 mode - we don't put a test within the macro it is. This is called only in UTF-8 mode - we don't put a test within the macro
because almost all calls are already within a block of UTF-8 only code. */ because almost all calls are already within a block of UTF-8 only code. */
@ -895,30 +882,6 @@ we do not know if we are in UTF-16 mode. */
c = *eptr; \ c = *eptr; \
if (utf && (c & 0xfc00) == 0xd800) GETUTF16LEN(c, eptr, len); if (utf && (c & 0xfc00) == 0xd800) GETUTF16LEN(c, eptr, len);
/* Returns the next uchar, not advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHAR(eptr) \
(*(eptr))
/* Returns the next uchar, advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHARINC(eptr) \
(*((eptr)++))
/* Returns the next uchar, testing for UTF mode, and not advancing the
pointer. */
#define RAWUCHARTEST(eptr) \
(*(eptr))
/* Returns the next uchar, testing for UTF mode, advancing the
pointer. */
#define RAWUCHARINCTEST(eptr) \
(*((eptr)++))
/* If the pointer is not at the start of a character, move it back until /* If the pointer is not at the start of a character, move it back until
it is. This is called only in UTF-16 mode - we don't put a test within the it is. This is called only in UTF-16 mode - we don't put a test within the
macro because almost all calls are already within a block of UTF-16 only macro because almost all calls are already within a block of UTF-16 only
@ -980,30 +943,6 @@ This is called when we do not know if we are in UTF-32 mode. */
#define GETCHARLENTEST(c, eptr, len) \ #define GETCHARLENTEST(c, eptr, len) \
GETCHARTEST(c, eptr) GETCHARTEST(c, eptr)
/* Returns the next uchar, not advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHAR(eptr) \
(*(eptr))
/* Returns the next uchar, advancing the pointer. This is called when
we know we are in UTF mode. */
#define RAWUCHARINC(eptr) \
(*((eptr)++))
/* Returns the next uchar, testing for UTF mode, and not advancing the
pointer. */
#define RAWUCHARTEST(eptr) \
(*(eptr))
/* Returns the next uchar, testing for UTF mode, advancing the
pointer. */
#define RAWUCHARINCTEST(eptr) \
(*((eptr)++))
/* If the pointer is not at the start of a character, move it back until /* If the pointer is not at the start of a character, move it back until
it is. This is called only in UTF-32 mode - we don't put a test within the it is. This is called only in UTF-32 mode - we don't put a test within the
macro because almost all calls are already within a block of UTF-32 only macro because almost all calls are already within a block of UTF-32 only
@ -1876,6 +1815,7 @@ contain characters with values greater than 255. */
#define XCL_NOT 0x01 /* Flag: this is a negative class */ #define XCL_NOT 0x01 /* Flag: this is a negative class */
#define XCL_MAP 0x02 /* Flag: a 32-byte map is present */ #define XCL_MAP 0x02 /* Flag: a 32-byte map is present */
#define XCL_HASPROP 0x04 /* Flag: property checks are present. */
#define XCL_END 0 /* Marks end of individual items */ #define XCL_END 0 /* Marks end of individual items */
#define XCL_SINGLE 1 /* Single item (one multibyte char) follows */ #define XCL_SINGLE 1 /* Single item (one multibyte char) follows */
@ -2341,7 +2281,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59, ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69,
ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79,
ERR80, ERR81, ERR82, ERR83, ERR84, ERRCOUNT }; ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERRCOUNT };
/* JIT compiling modes. The function list is indexed by them. */ /* JIT compiling modes. The function list is indexed by them. */

View file

@ -1,572 +0,0 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2010 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This module contains a PCRE private debugging function for printing out the
internal form of a compiled regular expression, along with some supporting
local functions. This source file is used in two places:
(1) It is #included by pcre_compile.c when it is compiled in debugging mode
(PCRE_DEBUG defined in pcre_internal.h). It is not included in production
compiles.
(2) It is always #included by pcretest.c, which can be asked to print out a
compiled regex for debugging purposes. */
/* Macro that decides whether a character should be output as a literal or in
hexadecimal. We don't use isprint() because that can vary from system to system
(even without the use of locales) and we want the output always to be the same,
for testing purposes. This macro is used in pcretest as well as in this file. */
#ifdef EBCDIC
#define PRINTABLE(c) ((c) >= 64 && (c) < 255)
#else
#define PRINTABLE(c) ((c) >= 32 && (c) < 127)
#endif
/* The table of operator names. */
static const char *OP_names[] = { OP_NAME_LIST };
/*************************************************
* Print single- or multi-byte character *
*************************************************/
static int
print_char(FILE *f, uschar *ptr, BOOL utf8)
{
int c = *ptr;
#ifndef SUPPORT_UTF8
utf8 = utf8; /* Avoid compiler warning */
if (PRINTABLE(c)) fprintf(f, "%c", c); else fprintf(f, "\\x%02x", c);
return 0;
#else
if (!utf8 || (c & 0xc0) != 0xc0)
{
if (PRINTABLE(c)) fprintf(f, "%c", c); else fprintf(f, "\\x%02x", c);
return 0;
}
else
{
int i;
int a = _pcre_utf8_table4[c & 0x3f]; /* Number of additional bytes */
int s = 6*a;
c = (c & _pcre_utf8_table3[a]) << s;
for (i = 1; i <= a; i++)
{
/* This is a check for malformed UTF-8; it should only occur if the sanity
check has been turned off. Rather than swallow random bytes, just stop if
we hit a bad one. Print it with \X instead of \x as an indication. */
if ((ptr[i] & 0xc0) != 0x80)
{
fprintf(f, "\\X{%x}", c);
return i - 1;
}
/* The byte is OK */
s -= 6;
c |= (ptr[i] & 0x3f) << s;
}
if (c < 128) fprintf(f, "\\x%02x", c); else fprintf(f, "\\x{%x}", c);
return a;
}
#endif
}
/*************************************************
* Find Unicode property name *
*************************************************/
static const char *
get_ucpname(int ptype, int pvalue)
{
#ifdef SUPPORT_UCP
int i;
for (i = _pcre_utt_size - 1; i >= 0; i--)
{
if (ptype == _pcre_utt[i].type && pvalue == _pcre_utt[i].value) break;
}
return (i >= 0)? _pcre_utt_names + _pcre_utt[i].name_offset : "??";
#else
/* It gets harder and harder to shut off unwanted compiler warnings. */
ptype = ptype * pvalue;
return (ptype == pvalue)? "??" : "??";
#endif
}
/*************************************************
* Print compiled regex *
*************************************************/
/* Make this function work for a regex with integers either byte order.
However, we assume that what we are passed is a compiled regex. The
print_lengths flag controls whether offsets and lengths of items are printed.
They can be turned off from pcretest so that automatic tests on bytecode can be
written that do not depend on the value of LINK_SIZE. */
static void
pcre_printint(pcre *external_re, FILE *f, BOOL print_lengths)
{
real_pcre *re = (real_pcre *)external_re;
uschar *codestart, *code;
BOOL utf8;
unsigned int options = re->options;
int offset = re->name_table_offset;
int count = re->name_count;
int size = re->name_entry_size;
if (re->magic_number != MAGIC_NUMBER)
{
offset = ((offset << 8) & 0xff00) | ((offset >> 8) & 0xff);
count = ((count << 8) & 0xff00) | ((count >> 8) & 0xff);
size = ((size << 8) & 0xff00) | ((size >> 8) & 0xff);
options = ((options << 24) & 0xff000000) |
((options << 8) & 0x00ff0000) |
((options >> 8) & 0x0000ff00) |
((options >> 24) & 0x000000ff);
}
code = codestart = (uschar *)re + offset + count * size;
utf8 = (options & PCRE_UTF8) != 0;
for(;;)
{
uschar *ccode;
int c;
int extra = 0;
if (print_lengths)
fprintf(f, "%3d ", (int)(code - codestart));
else
fprintf(f, " ");
switch(*code)
{
/* ========================================================================== */
/* These cases are never obeyed. This is a fudge that causes a compile-
time error if the vectors OP_names or _pcre_OP_lengths, which are indexed
by opcode, are not the correct length. It seems to be the only way to do
such a check at compile time, as the sizeof() operator does not work in
the C preprocessor. We do this while compiling pcretest, because that
#includes pcre_tables.c, which holds _pcre_OP_lengths. We can't do this
when building pcre_compile.c with PCRE_DEBUG set, because it doesn't then
know the size of _pcre_OP_lengths. */
#ifdef COMPILING_PCRETEST
case OP_TABLE_LENGTH:
case OP_TABLE_LENGTH +
((sizeof(OP_names)/sizeof(const char *) == OP_TABLE_LENGTH) &&
(sizeof(_pcre_OP_lengths) == OP_TABLE_LENGTH)):
break;
#endif
/* ========================================================================== */
case OP_END:
fprintf(f, " %s\n", OP_names[*code]);
fprintf(f, "------------------------------------------------------------------\n");
return;
case OP_OPT:
fprintf(f, " %.2x %s", code[1], OP_names[*code]);
break;
case OP_CHAR:
fprintf(f, " ");
do
{
code++;
code += 1 + print_char(f, code, utf8);
}
while (*code == OP_CHAR);
fprintf(f, "\n");
continue;
case OP_CHARNC:
fprintf(f, " NC ");
do
{
code++;
code += 1 + print_char(f, code, utf8);
}
while (*code == OP_CHARNC);
fprintf(f, "\n");
continue;
case OP_CBRA:
case OP_SCBRA:
if (print_lengths) fprintf(f, "%3d ", GET(code, 1));
else fprintf(f, " ");
fprintf(f, "%s %d", OP_names[*code], GET2(code, 1+LINK_SIZE));
break;
case OP_BRA:
case OP_SBRA:
case OP_KETRMAX:
case OP_KETRMIN:
case OP_ALT:
case OP_KET:
case OP_ASSERT:
case OP_ASSERT_NOT:
case OP_ASSERTBACK:
case OP_ASSERTBACK_NOT:
case OP_ONCE:
case OP_COND:
case OP_SCOND:
case OP_REVERSE:
if (print_lengths) fprintf(f, "%3d ", GET(code, 1));
else fprintf(f, " ");
fprintf(f, "%s", OP_names[*code]);
break;
case OP_CLOSE:
fprintf(f, " %s %d", OP_names[*code], GET2(code, 1));
break;
case OP_CREF:
case OP_NCREF:
fprintf(f, "%3d %s", GET2(code,1), OP_names[*code]);
break;
case OP_RREF:
c = GET2(code, 1);
if (c == RREF_ANY)
fprintf(f, " Cond recurse any");
else
fprintf(f, " Cond recurse %d", c);
break;
case OP_NRREF:
c = GET2(code, 1);
if (c == RREF_ANY)
fprintf(f, " Cond nrecurse any");
else
fprintf(f, " Cond nrecurse %d", c);
break;
case OP_DEF:
fprintf(f, " Cond def");
break;
case OP_STAR:
case OP_MINSTAR:
case OP_POSSTAR:
case OP_PLUS:
case OP_MINPLUS:
case OP_POSPLUS:
case OP_QUERY:
case OP_MINQUERY:
case OP_POSQUERY:
case OP_TYPESTAR:
case OP_TYPEMINSTAR:
case OP_TYPEPOSSTAR:
case OP_TYPEPLUS:
case OP_TYPEMINPLUS:
case OP_TYPEPOSPLUS:
case OP_TYPEQUERY:
case OP_TYPEMINQUERY:
case OP_TYPEPOSQUERY:
fprintf(f, " ");
if (*code >= OP_TYPESTAR)
{
fprintf(f, "%s", OP_names[code[1]]);
if (code[1] == OP_PROP || code[1] == OP_NOTPROP)
{
fprintf(f, " %s ", get_ucpname(code[2], code[3]));
extra = 2;
}
}
else extra = print_char(f, code+1, utf8);
fprintf(f, "%s", OP_names[*code]);
break;
case OP_EXACT:
case OP_UPTO:
case OP_MINUPTO:
case OP_POSUPTO:
fprintf(f, " ");
extra = print_char(f, code+3, utf8);
fprintf(f, "{");
if (*code != OP_EXACT) fprintf(f, "0,");
fprintf(f, "%d}", GET2(code,1));
if (*code == OP_MINUPTO) fprintf(f, "?");
else if (*code == OP_POSUPTO) fprintf(f, "+");
break;
case OP_TYPEEXACT:
case OP_TYPEUPTO:
case OP_TYPEMINUPTO:
case OP_TYPEPOSUPTO:
fprintf(f, " %s", OP_names[code[3]]);
if (code[3] == OP_PROP || code[3] == OP_NOTPROP)
{
fprintf(f, " %s ", get_ucpname(code[4], code[5]));
extra = 2;
}
fprintf(f, "{");
if (*code != OP_TYPEEXACT) fprintf(f, "0,");
fprintf(f, "%d}", GET2(code,1));
if (*code == OP_TYPEMINUPTO) fprintf(f, "?");
else if (*code == OP_TYPEPOSUPTO) fprintf(f, "+");
break;
case OP_NOT:
c = code[1];
if (PRINTABLE(c)) fprintf(f, " [^%c]", c);
else fprintf(f, " [^\\x%02x]", c);
break;
case OP_NOTSTAR:
case OP_NOTMINSTAR:
case OP_NOTPOSSTAR:
case OP_NOTPLUS:
case OP_NOTMINPLUS:
case OP_NOTPOSPLUS:
case OP_NOTQUERY:
case OP_NOTMINQUERY:
case OP_NOTPOSQUERY:
c = code[1];
if (PRINTABLE(c)) fprintf(f, " [^%c]", c);
else fprintf(f, " [^\\x%02x]", c);
fprintf(f, "%s", OP_names[*code]);
break;
case OP_NOTEXACT:
case OP_NOTUPTO:
case OP_NOTMINUPTO:
case OP_NOTPOSUPTO:
c = code[3];
if (PRINTABLE(c)) fprintf(f, " [^%c]{", c);
else fprintf(f, " [^\\x%02x]{", c);
if (*code != OP_NOTEXACT) fprintf(f, "0,");
fprintf(f, "%d}", GET2(code,1));
if (*code == OP_NOTMINUPTO) fprintf(f, "?");
else if (*code == OP_NOTPOSUPTO) fprintf(f, "+");
break;
case OP_RECURSE:
if (print_lengths) fprintf(f, "%3d ", GET(code, 1));
else fprintf(f, " ");
fprintf(f, "%s", OP_names[*code]);
break;
case OP_REF:
fprintf(f, " \\%d", GET2(code,1));
ccode = code + _pcre_OP_lengths[*code];
goto CLASS_REF_REPEAT;
case OP_CALLOUT:
fprintf(f, " %s %d %d %d", OP_names[*code], code[1], GET(code,2),
GET(code, 2 + LINK_SIZE));
break;
case OP_PROP:
case OP_NOTPROP:
fprintf(f, " %s %s", OP_names[*code], get_ucpname(code[1], code[2]));
break;
/* OP_XCLASS can only occur in UTF-8 mode. However, there's no harm in
having this code always here, and it makes it less messy without all those
#ifdefs. */
case OP_CLASS:
case OP_NCLASS:
case OP_XCLASS:
{
int i, min, max;
BOOL printmap;
fprintf(f, " [");
if (*code == OP_XCLASS)
{
extra = GET(code, 1);
ccode = code + LINK_SIZE + 1;
printmap = (*ccode & XCL_MAP) != 0;
if ((*ccode++ & XCL_NOT) != 0) fprintf(f, "^");
}
else
{
printmap = TRUE;
ccode = code + 1;
}
/* Print a bit map */
if (printmap)
{
for (i = 0; i < 256; i++)
{
if ((ccode[i/8] & (1 << (i&7))) != 0)
{
int j;
for (j = i+1; j < 256; j++)
if ((ccode[j/8] & (1 << (j&7))) == 0) break;
if (i == '-' || i == ']') fprintf(f, "\\");
if (PRINTABLE(i)) fprintf(f, "%c", i);
else fprintf(f, "\\x%02x", i);
if (--j > i)
{
if (j != i + 1) fprintf(f, "-");
if (j == '-' || j == ']') fprintf(f, "\\");
if (PRINTABLE(j)) fprintf(f, "%c", j);
else fprintf(f, "\\x%02x", j);
}
i = j;
}
}
ccode += 32;
}
/* For an XCLASS there is always some additional data */
if (*code == OP_XCLASS)
{
int ch;
while ((ch = *ccode++) != XCL_END)
{
if (ch == XCL_PROP)
{
int ptype = *ccode++;
int pvalue = *ccode++;
fprintf(f, "\\p{%s}", get_ucpname(ptype, pvalue));
}
else if (ch == XCL_NOTPROP)
{
int ptype = *ccode++;
int pvalue = *ccode++;
fprintf(f, "\\P{%s}", get_ucpname(ptype, pvalue));
}
else
{
ccode += 1 + print_char(f, ccode, TRUE);
if (ch == XCL_RANGE)
{
fprintf(f, "-");
ccode += 1 + print_char(f, ccode, TRUE);
}
}
}
}
/* Indicate a non-UTF8 class which was created by negation */
fprintf(f, "]%s", (*code == OP_NCLASS)? " (neg)" : "");
/* Handle repeats after a class or a back reference */
CLASS_REF_REPEAT:
switch(*ccode)
{
case OP_CRSTAR:
case OP_CRMINSTAR:
case OP_CRPLUS:
case OP_CRMINPLUS:
case OP_CRQUERY:
case OP_CRMINQUERY:
fprintf(f, "%s", OP_names[*ccode]);
extra += _pcre_OP_lengths[*ccode];
break;
case OP_CRRANGE:
case OP_CRMINRANGE:
min = GET2(ccode,1);
max = GET2(ccode,3);
if (max == 0) fprintf(f, "{%d,}", min);
else fprintf(f, "{%d,%d}", min, max);
if (*ccode == OP_CRMINRANGE) fprintf(f, "?");
extra += _pcre_OP_lengths[*ccode];
break;
/* Do nothing if it's not a repeat; this code stops picky compilers
warning about the lack of a default code path. */
default:
break;
}
}
break;
case OP_MARK:
case OP_PRUNE_ARG:
case OP_SKIP_ARG:
fprintf(f, " %s %s", OP_names[*code], code + 2);
extra += code[1];
break;
case OP_THEN:
if (print_lengths)
fprintf(f, " %s %d", OP_names[*code], GET(code, 1));
else
fprintf(f, " %s", OP_names[*code]);
break;
case OP_THEN_ARG:
if (print_lengths)
fprintf(f, " %s %d %s", OP_names[*code], GET(code, 1),
code + 2 + LINK_SIZE);
else
fprintf(f, " %s %s", OP_names[*code], code + 2 + LINK_SIZE);
extra += code[1+LINK_SIZE];
break;
/* Anything else is just an item with no data*/
default:
fprintf(f, " %s", OP_names[*code]);
break;
}
code += _pcre_OP_lengths[*code] + extra;
fprintf(f, "\n");
}
}
/* End of pcre_printint.src */

View file

@ -863,7 +863,6 @@ do
case OP_NOTUPTOI: case OP_NOTUPTOI:
case OP_NOT_HSPACE: case OP_NOT_HSPACE:
case OP_NOT_VSPACE: case OP_NOT_VSPACE:
case OP_PROP:
case OP_PRUNE: case OP_PRUNE:
case OP_PRUNE_ARG: case OP_PRUNE_ARG:
case OP_RECURSE: case OP_RECURSE:
@ -879,11 +878,33 @@ do
case OP_SOM: case OP_SOM:
case OP_THEN: case OP_THEN:
case OP_THEN_ARG: case OP_THEN_ARG:
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
case OP_XCLASS:
#endif
return SSB_FAIL; return SSB_FAIL;
/* A "real" property test implies no starting bits, but the fake property
PT_CLIST identifies a list of characters. These lists are short, as they
are used for characters with more than one "other case", so there is no
point in recognizing them for OP_NOTPROP. */
case OP_PROP:
if (tcode[1] != PT_CLIST) return SSB_FAIL;
{
const pcre_uint32 *p = PRIV(ucd_caseless_sets) + tcode[2];
while ((c = *p++) < NOTACHAR)
{
#if defined SUPPORT_UTF && defined COMPILE_PCRE8
if (utf)
{
pcre_uchar buff[6];
(void)PRIV(ord2utf)(c, buff);
c = buff[0];
}
#endif
if (c > 0xff) SET_BIT(0xff); else SET_BIT(c);
}
}
try_next = FALSE;
break;
/* We can ignore word boundary tests. */ /* We can ignore word boundary tests. */
case OP_WORD_BOUNDARY: case OP_WORD_BOUNDARY:
@ -1109,24 +1130,17 @@ do
try_next = FALSE; try_next = FALSE;
break; break;
/* The cbit_space table has vertical tab as whitespace; we have to /* The cbit_space table has vertical tab as whitespace; we no longer
ensure it is set as not whitespace. Luckily, the code value is the same have to play fancy tricks because Perl added VT to its whitespace at
(0x0b) in ASCII and EBCDIC, so we can just adjust the appropriate bit. */ release 5.18. PCRE added it at release 8.34. */
case OP_NOT_WHITESPACE: case OP_NOT_WHITESPACE:
set_nottype_bits(start_bits, cbit_space, table_limit, cd); set_nottype_bits(start_bits, cbit_space, table_limit, cd);
start_bits[1] |= 0x08;
try_next = FALSE; try_next = FALSE;
break; break;
/* The cbit_space table has vertical tab as whitespace; we have to not
set it from the table. Luckily, the code value is the same (0x0b) in
ASCII and EBCDIC, so we can just adjust the appropriate bit. */
case OP_WHITESPACE: case OP_WHITESPACE:
c = start_bits[1]; /* Save in case it was already set */
set_type_bits(start_bits, cbit_space, table_limit, cd); set_type_bits(start_bits, cbit_space, table_limit, cd);
start_bits[1] = (start_bits[1] & ~0x08) | c;
try_next = FALSE; try_next = FALSE;
break; break;
@ -1257,6 +1271,16 @@ do
with a value >= 0xc4 is a potentially valid starter because it starts a with a value >= 0xc4 is a potentially valid starter because it starts a
character with a value > 255. */ character with a value > 255. */
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
case OP_XCLASS:
if ((tcode[1 + LINK_SIZE] & XCL_HASPROP) != 0)
return SSB_FAIL;
/* All bits are set. */
if ((tcode[1 + LINK_SIZE] & XCL_MAP) == 0 && (tcode[1 + LINK_SIZE] & XCL_NOT) != 0)
return SSB_FAIL;
#endif
/* Fall through */
case OP_NCLASS: case OP_NCLASS:
#if defined SUPPORT_UTF && defined COMPILE_PCRE8 #if defined SUPPORT_UTF && defined COMPILE_PCRE8
if (utf) if (utf)
@ -1273,8 +1297,21 @@ do
case OP_CLASS: case OP_CLASS:
{ {
pcre_uint8 *map; pcre_uint8 *map;
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
map = NULL;
if (*tcode == OP_XCLASS)
{
if ((tcode[1 + LINK_SIZE] & XCL_MAP) != 0)
map = (pcre_uint8 *)(tcode + 1 + LINK_SIZE + 1);
tcode += GET(tcode, 1);
}
else
#endif
{
tcode++; tcode++;
map = (pcre_uint8 *)tcode; map = (pcre_uint8 *)tcode;
tcode += 32 / sizeof(pcre_uchar);
}
/* In UTF-8 mode, the bits in a bit map correspond to character /* In UTF-8 mode, the bits in a bit map correspond to character
values, not to byte values. However, the bit map we are constructing is values, not to byte values. However, the bit map we are constructing is
@ -1282,6 +1319,10 @@ do
value is > 127. In fact, there are only two possible starting bytes for value is > 127. In fact, there are only two possible starting bytes for
characters in the range 128 - 255. */ characters in the range 128 - 255. */
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
if (map != NULL)
#endif
{
#if defined SUPPORT_UTF && defined COMPILE_PCRE8 #if defined SUPPORT_UTF && defined COMPILE_PCRE8
if (utf) if (utf)
{ {
@ -1302,11 +1343,11 @@ do
/* In non-UTF-8 mode, the two bit maps are completely compatible. */ /* In non-UTF-8 mode, the two bit maps are completely compatible. */
for (c = 0; c < 32; c++) start_bits[c] |= map[c]; for (c = 0; c < 32; c++) start_bits[c] |= map[c];
} }
}
/* Advance past the bit map, and act on what follows. For a zero /* Advance past the bit map, and act on what follows. For a zero
minimum repeat, continue; otherwise stop processing. */ minimum repeat, continue; otherwise stop processing. */
tcode += 32 / sizeof(pcre_uchar);
switch (*tcode) switch (*tcode)
{ {
case OP_CRSTAR: case OP_CRSTAR:

View file

@ -213,6 +213,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Avestan0 STR_A STR_v STR_e STR_s STR_t STR_a STR_n "\0" #define STRING_Avestan0 STR_A STR_v STR_e STR_s STR_t STR_a STR_n "\0"
#define STRING_Balinese0 STR_B STR_a STR_l STR_i STR_n STR_e STR_s STR_e "\0" #define STRING_Balinese0 STR_B STR_a STR_l STR_i STR_n STR_e STR_s STR_e "\0"
#define STRING_Bamum0 STR_B STR_a STR_m STR_u STR_m "\0" #define STRING_Bamum0 STR_B STR_a STR_m STR_u STR_m "\0"
#define STRING_Bassa_Vah0 STR_B STR_a STR_s STR_s STR_a STR_UNDERSCORE STR_V STR_a STR_h "\0"
#define STRING_Batak0 STR_B STR_a STR_t STR_a STR_k "\0" #define STRING_Batak0 STR_B STR_a STR_t STR_a STR_k "\0"
#define STRING_Bengali0 STR_B STR_e STR_n STR_g STR_a STR_l STR_i "\0" #define STRING_Bengali0 STR_B STR_e STR_n STR_g STR_a STR_l STR_i "\0"
#define STRING_Bopomofo0 STR_B STR_o STR_p STR_o STR_m STR_o STR_f STR_o "\0" #define STRING_Bopomofo0 STR_B STR_o STR_p STR_o STR_m STR_o STR_f STR_o "\0"
@ -223,6 +224,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_C0 STR_C "\0" #define STRING_C0 STR_C "\0"
#define STRING_Canadian_Aboriginal0 STR_C STR_a STR_n STR_a STR_d STR_i STR_a STR_n STR_UNDERSCORE STR_A STR_b STR_o STR_r STR_i STR_g STR_i STR_n STR_a STR_l "\0" #define STRING_Canadian_Aboriginal0 STR_C STR_a STR_n STR_a STR_d STR_i STR_a STR_n STR_UNDERSCORE STR_A STR_b STR_o STR_r STR_i STR_g STR_i STR_n STR_a STR_l "\0"
#define STRING_Carian0 STR_C STR_a STR_r STR_i STR_a STR_n "\0" #define STRING_Carian0 STR_C STR_a STR_r STR_i STR_a STR_n "\0"
#define STRING_Caucasian_Albanian0 STR_C STR_a STR_u STR_c STR_a STR_s STR_i STR_a STR_n STR_UNDERSCORE STR_A STR_l STR_b STR_a STR_n STR_i STR_a STR_n "\0"
#define STRING_Cc0 STR_C STR_c "\0" #define STRING_Cc0 STR_C STR_c "\0"
#define STRING_Cf0 STR_C STR_f "\0" #define STRING_Cf0 STR_C STR_f "\0"
#define STRING_Chakma0 STR_C STR_h STR_a STR_k STR_m STR_a "\0" #define STRING_Chakma0 STR_C STR_h STR_a STR_k STR_m STR_a "\0"
@ -238,11 +240,14 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Cyrillic0 STR_C STR_y STR_r STR_i STR_l STR_l STR_i STR_c "\0" #define STRING_Cyrillic0 STR_C STR_y STR_r STR_i STR_l STR_l STR_i STR_c "\0"
#define STRING_Deseret0 STR_D STR_e STR_s STR_e STR_r STR_e STR_t "\0" #define STRING_Deseret0 STR_D STR_e STR_s STR_e STR_r STR_e STR_t "\0"
#define STRING_Devanagari0 STR_D STR_e STR_v STR_a STR_n STR_a STR_g STR_a STR_r STR_i "\0" #define STRING_Devanagari0 STR_D STR_e STR_v STR_a STR_n STR_a STR_g STR_a STR_r STR_i "\0"
#define STRING_Duployan0 STR_D STR_u STR_p STR_l STR_o STR_y STR_a STR_n "\0"
#define STRING_Egyptian_Hieroglyphs0 STR_E STR_g STR_y STR_p STR_t STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0" #define STRING_Egyptian_Hieroglyphs0 STR_E STR_g STR_y STR_p STR_t STR_i STR_a STR_n STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
#define STRING_Elbasan0 STR_E STR_l STR_b STR_a STR_s STR_a STR_n "\0"
#define STRING_Ethiopic0 STR_E STR_t STR_h STR_i STR_o STR_p STR_i STR_c "\0" #define STRING_Ethiopic0 STR_E STR_t STR_h STR_i STR_o STR_p STR_i STR_c "\0"
#define STRING_Georgian0 STR_G STR_e STR_o STR_r STR_g STR_i STR_a STR_n "\0" #define STRING_Georgian0 STR_G STR_e STR_o STR_r STR_g STR_i STR_a STR_n "\0"
#define STRING_Glagolitic0 STR_G STR_l STR_a STR_g STR_o STR_l STR_i STR_t STR_i STR_c "\0" #define STRING_Glagolitic0 STR_G STR_l STR_a STR_g STR_o STR_l STR_i STR_t STR_i STR_c "\0"
#define STRING_Gothic0 STR_G STR_o STR_t STR_h STR_i STR_c "\0" #define STRING_Gothic0 STR_G STR_o STR_t STR_h STR_i STR_c "\0"
#define STRING_Grantha0 STR_G STR_r STR_a STR_n STR_t STR_h STR_a "\0"
#define STRING_Greek0 STR_G STR_r STR_e STR_e STR_k "\0" #define STRING_Greek0 STR_G STR_r STR_e STR_e STR_k "\0"
#define STRING_Gujarati0 STR_G STR_u STR_j STR_a STR_r STR_a STR_t STR_i "\0" #define STRING_Gujarati0 STR_G STR_u STR_j STR_a STR_r STR_a STR_t STR_i "\0"
#define STRING_Gurmukhi0 STR_G STR_u STR_r STR_m STR_u STR_k STR_h STR_i "\0" #define STRING_Gurmukhi0 STR_G STR_u STR_r STR_m STR_u STR_k STR_h STR_i "\0"
@ -262,12 +267,15 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Kayah_Li0 STR_K STR_a STR_y STR_a STR_h STR_UNDERSCORE STR_L STR_i "\0" #define STRING_Kayah_Li0 STR_K STR_a STR_y STR_a STR_h STR_UNDERSCORE STR_L STR_i "\0"
#define STRING_Kharoshthi0 STR_K STR_h STR_a STR_r STR_o STR_s STR_h STR_t STR_h STR_i "\0" #define STRING_Kharoshthi0 STR_K STR_h STR_a STR_r STR_o STR_s STR_h STR_t STR_h STR_i "\0"
#define STRING_Khmer0 STR_K STR_h STR_m STR_e STR_r "\0" #define STRING_Khmer0 STR_K STR_h STR_m STR_e STR_r "\0"
#define STRING_Khojki0 STR_K STR_h STR_o STR_j STR_k STR_i "\0"
#define STRING_Khudawadi0 STR_K STR_h STR_u STR_d STR_a STR_w STR_a STR_d STR_i "\0"
#define STRING_L0 STR_L "\0" #define STRING_L0 STR_L "\0"
#define STRING_L_AMPERSAND0 STR_L STR_AMPERSAND "\0" #define STRING_L_AMPERSAND0 STR_L STR_AMPERSAND "\0"
#define STRING_Lao0 STR_L STR_a STR_o "\0" #define STRING_Lao0 STR_L STR_a STR_o "\0"
#define STRING_Latin0 STR_L STR_a STR_t STR_i STR_n "\0" #define STRING_Latin0 STR_L STR_a STR_t STR_i STR_n "\0"
#define STRING_Lepcha0 STR_L STR_e STR_p STR_c STR_h STR_a "\0" #define STRING_Lepcha0 STR_L STR_e STR_p STR_c STR_h STR_a "\0"
#define STRING_Limbu0 STR_L STR_i STR_m STR_b STR_u "\0" #define STRING_Limbu0 STR_L STR_i STR_m STR_b STR_u "\0"
#define STRING_Linear_A0 STR_L STR_i STR_n STR_e STR_a STR_r STR_UNDERSCORE STR_A "\0"
#define STRING_Linear_B0 STR_L STR_i STR_n STR_e STR_a STR_r STR_UNDERSCORE STR_B "\0" #define STRING_Linear_B0 STR_L STR_i STR_n STR_e STR_a STR_r STR_UNDERSCORE STR_B "\0"
#define STRING_Lisu0 STR_L STR_i STR_s STR_u "\0" #define STRING_Lisu0 STR_L STR_i STR_s STR_u "\0"
#define STRING_Ll0 STR_L STR_l "\0" #define STRING_Ll0 STR_L STR_l "\0"
@ -278,18 +286,24 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Lycian0 STR_L STR_y STR_c STR_i STR_a STR_n "\0" #define STRING_Lycian0 STR_L STR_y STR_c STR_i STR_a STR_n "\0"
#define STRING_Lydian0 STR_L STR_y STR_d STR_i STR_a STR_n "\0" #define STRING_Lydian0 STR_L STR_y STR_d STR_i STR_a STR_n "\0"
#define STRING_M0 STR_M "\0" #define STRING_M0 STR_M "\0"
#define STRING_Mahajani0 STR_M STR_a STR_h STR_a STR_j STR_a STR_n STR_i "\0"
#define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0" #define STRING_Malayalam0 STR_M STR_a STR_l STR_a STR_y STR_a STR_l STR_a STR_m "\0"
#define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0" #define STRING_Mandaic0 STR_M STR_a STR_n STR_d STR_a STR_i STR_c "\0"
#define STRING_Manichaean0 STR_M STR_a STR_n STR_i STR_c STR_h STR_a STR_e STR_a STR_n "\0"
#define STRING_Mc0 STR_M STR_c "\0" #define STRING_Mc0 STR_M STR_c "\0"
#define STRING_Me0 STR_M STR_e "\0" #define STRING_Me0 STR_M STR_e "\0"
#define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0" #define STRING_Meetei_Mayek0 STR_M STR_e STR_e STR_t STR_e STR_i STR_UNDERSCORE STR_M STR_a STR_y STR_e STR_k "\0"
#define STRING_Mende_Kikakui0 STR_M STR_e STR_n STR_d STR_e STR_UNDERSCORE STR_K STR_i STR_k STR_a STR_k STR_u STR_i "\0"
#define STRING_Meroitic_Cursive0 STR_M STR_e STR_r STR_o STR_i STR_t STR_i STR_c STR_UNDERSCORE STR_C STR_u STR_r STR_s STR_i STR_v STR_e "\0" #define STRING_Meroitic_Cursive0 STR_M STR_e STR_r STR_o STR_i STR_t STR_i STR_c STR_UNDERSCORE STR_C STR_u STR_r STR_s STR_i STR_v STR_e "\0"
#define STRING_Meroitic_Hieroglyphs0 STR_M STR_e STR_r STR_o STR_i STR_t STR_i STR_c STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0" #define STRING_Meroitic_Hieroglyphs0 STR_M STR_e STR_r STR_o STR_i STR_t STR_i STR_c STR_UNDERSCORE STR_H STR_i STR_e STR_r STR_o STR_g STR_l STR_y STR_p STR_h STR_s "\0"
#define STRING_Miao0 STR_M STR_i STR_a STR_o "\0" #define STRING_Miao0 STR_M STR_i STR_a STR_o "\0"
#define STRING_Mn0 STR_M STR_n "\0" #define STRING_Mn0 STR_M STR_n "\0"
#define STRING_Modi0 STR_M STR_o STR_d STR_i "\0"
#define STRING_Mongolian0 STR_M STR_o STR_n STR_g STR_o STR_l STR_i STR_a STR_n "\0" #define STRING_Mongolian0 STR_M STR_o STR_n STR_g STR_o STR_l STR_i STR_a STR_n "\0"
#define STRING_Mro0 STR_M STR_r STR_o "\0"
#define STRING_Myanmar0 STR_M STR_y STR_a STR_n STR_m STR_a STR_r "\0" #define STRING_Myanmar0 STR_M STR_y STR_a STR_n STR_m STR_a STR_r "\0"
#define STRING_N0 STR_N "\0" #define STRING_N0 STR_N "\0"
#define STRING_Nabataean0 STR_N STR_a STR_b STR_a STR_t STR_a STR_e STR_a STR_n "\0"
#define STRING_Nd0 STR_N STR_d "\0" #define STRING_Nd0 STR_N STR_d "\0"
#define STRING_New_Tai_Lue0 STR_N STR_e STR_w STR_UNDERSCORE STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_u STR_e "\0" #define STRING_New_Tai_Lue0 STR_N STR_e STR_w STR_UNDERSCORE STR_T STR_a STR_i STR_UNDERSCORE STR_L STR_u STR_e "\0"
#define STRING_Nko0 STR_N STR_k STR_o "\0" #define STRING_Nko0 STR_N STR_k STR_o "\0"
@ -298,12 +312,17 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Ogham0 STR_O STR_g STR_h STR_a STR_m "\0" #define STRING_Ogham0 STR_O STR_g STR_h STR_a STR_m "\0"
#define STRING_Ol_Chiki0 STR_O STR_l STR_UNDERSCORE STR_C STR_h STR_i STR_k STR_i "\0" #define STRING_Ol_Chiki0 STR_O STR_l STR_UNDERSCORE STR_C STR_h STR_i STR_k STR_i "\0"
#define STRING_Old_Italic0 STR_O STR_l STR_d STR_UNDERSCORE STR_I STR_t STR_a STR_l STR_i STR_c "\0" #define STRING_Old_Italic0 STR_O STR_l STR_d STR_UNDERSCORE STR_I STR_t STR_a STR_l STR_i STR_c "\0"
#define STRING_Old_North_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_N STR_o STR_r STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Permic0 STR_O STR_l STR_d STR_UNDERSCORE STR_P STR_e STR_r STR_m STR_i STR_c "\0"
#define STRING_Old_Persian0 STR_O STR_l STR_d STR_UNDERSCORE STR_P STR_e STR_r STR_s STR_i STR_a STR_n "\0" #define STRING_Old_Persian0 STR_O STR_l STR_d STR_UNDERSCORE STR_P STR_e STR_r STR_s STR_i STR_a STR_n "\0"
#define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0" #define STRING_Old_South_Arabian0 STR_O STR_l STR_d STR_UNDERSCORE STR_S STR_o STR_u STR_t STR_h STR_UNDERSCORE STR_A STR_r STR_a STR_b STR_i STR_a STR_n "\0"
#define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0" #define STRING_Old_Turkic0 STR_O STR_l STR_d STR_UNDERSCORE STR_T STR_u STR_r STR_k STR_i STR_c "\0"
#define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0" #define STRING_Oriya0 STR_O STR_r STR_i STR_y STR_a "\0"
#define STRING_Osmanya0 STR_O STR_s STR_m STR_a STR_n STR_y STR_a "\0" #define STRING_Osmanya0 STR_O STR_s STR_m STR_a STR_n STR_y STR_a "\0"
#define STRING_P0 STR_P "\0" #define STRING_P0 STR_P "\0"
#define STRING_Pahawh_Hmong0 STR_P STR_a STR_h STR_a STR_w STR_h STR_UNDERSCORE STR_H STR_m STR_o STR_n STR_g "\0"
#define STRING_Palmyrene0 STR_P STR_a STR_l STR_m STR_y STR_r STR_e STR_n STR_e "\0"
#define STRING_Pau_Cin_Hau0 STR_P STR_a STR_u STR_UNDERSCORE STR_C STR_i STR_n STR_UNDERSCORE STR_H STR_a STR_u "\0"
#define STRING_Pc0 STR_P STR_c "\0" #define STRING_Pc0 STR_P STR_c "\0"
#define STRING_Pd0 STR_P STR_d "\0" #define STRING_Pd0 STR_P STR_d "\0"
#define STRING_Pe0 STR_P STR_e "\0" #define STRING_Pe0 STR_P STR_e "\0"
@ -313,6 +332,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Pi0 STR_P STR_i "\0" #define STRING_Pi0 STR_P STR_i "\0"
#define STRING_Po0 STR_P STR_o "\0" #define STRING_Po0 STR_P STR_o "\0"
#define STRING_Ps0 STR_P STR_s "\0" #define STRING_Ps0 STR_P STR_s "\0"
#define STRING_Psalter_Pahlavi0 STR_P STR_s STR_a STR_l STR_t STR_e STR_r STR_UNDERSCORE STR_P STR_a STR_h STR_l STR_a STR_v STR_i "\0"
#define STRING_Rejang0 STR_R STR_e STR_j STR_a STR_n STR_g "\0" #define STRING_Rejang0 STR_R STR_e STR_j STR_a STR_n STR_g "\0"
#define STRING_Runic0 STR_R STR_u STR_n STR_i STR_c "\0" #define STRING_Runic0 STR_R STR_u STR_n STR_i STR_c "\0"
#define STRING_S0 STR_S "\0" #define STRING_S0 STR_S "\0"
@ -321,6 +341,7 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Sc0 STR_S STR_c "\0" #define STRING_Sc0 STR_S STR_c "\0"
#define STRING_Sharada0 STR_S STR_h STR_a STR_r STR_a STR_d STR_a "\0" #define STRING_Sharada0 STR_S STR_h STR_a STR_r STR_a STR_d STR_a "\0"
#define STRING_Shavian0 STR_S STR_h STR_a STR_v STR_i STR_a STR_n "\0" #define STRING_Shavian0 STR_S STR_h STR_a STR_v STR_i STR_a STR_n "\0"
#define STRING_Siddham0 STR_S STR_i STR_d STR_d STR_h STR_a STR_m "\0"
#define STRING_Sinhala0 STR_S STR_i STR_n STR_h STR_a STR_l STR_a "\0" #define STRING_Sinhala0 STR_S STR_i STR_n STR_h STR_a STR_l STR_a "\0"
#define STRING_Sk0 STR_S STR_k "\0" #define STRING_Sk0 STR_S STR_k "\0"
#define STRING_Sm0 STR_S STR_m "\0" #define STRING_Sm0 STR_S STR_m "\0"
@ -341,8 +362,10 @@ strings to make sure that UTF-8 support works on EBCDIC platforms. */
#define STRING_Thai0 STR_T STR_h STR_a STR_i "\0" #define STRING_Thai0 STR_T STR_h STR_a STR_i "\0"
#define STRING_Tibetan0 STR_T STR_i STR_b STR_e STR_t STR_a STR_n "\0" #define STRING_Tibetan0 STR_T STR_i STR_b STR_e STR_t STR_a STR_n "\0"
#define STRING_Tifinagh0 STR_T STR_i STR_f STR_i STR_n STR_a STR_g STR_h "\0" #define STRING_Tifinagh0 STR_T STR_i STR_f STR_i STR_n STR_a STR_g STR_h "\0"
#define STRING_Tirhuta0 STR_T STR_i STR_r STR_h STR_u STR_t STR_a "\0"
#define STRING_Ugaritic0 STR_U STR_g STR_a STR_r STR_i STR_t STR_i STR_c "\0" #define STRING_Ugaritic0 STR_U STR_g STR_a STR_r STR_i STR_t STR_i STR_c "\0"
#define STRING_Vai0 STR_V STR_a STR_i "\0" #define STRING_Vai0 STR_V STR_a STR_i "\0"
#define STRING_Warang_Citi0 STR_W STR_a STR_r STR_a STR_n STR_g STR_UNDERSCORE STR_C STR_i STR_t STR_i "\0"
#define STRING_Xan0 STR_X STR_a STR_n "\0" #define STRING_Xan0 STR_X STR_a STR_n "\0"
#define STRING_Xps0 STR_X STR_p STR_s "\0" #define STRING_Xps0 STR_X STR_p STR_s "\0"
#define STRING_Xsp0 STR_X STR_s STR_p "\0" #define STRING_Xsp0 STR_X STR_s STR_p "\0"
@ -361,6 +384,7 @@ const char PRIV(utt_names)[] =
STRING_Avestan0 STRING_Avestan0
STRING_Balinese0 STRING_Balinese0
STRING_Bamum0 STRING_Bamum0
STRING_Bassa_Vah0
STRING_Batak0 STRING_Batak0
STRING_Bengali0 STRING_Bengali0
STRING_Bopomofo0 STRING_Bopomofo0
@ -371,6 +395,7 @@ const char PRIV(utt_names)[] =
STRING_C0 STRING_C0
STRING_Canadian_Aboriginal0 STRING_Canadian_Aboriginal0
STRING_Carian0 STRING_Carian0
STRING_Caucasian_Albanian0
STRING_Cc0 STRING_Cc0
STRING_Cf0 STRING_Cf0
STRING_Chakma0 STRING_Chakma0
@ -386,11 +411,14 @@ const char PRIV(utt_names)[] =
STRING_Cyrillic0 STRING_Cyrillic0
STRING_Deseret0 STRING_Deseret0
STRING_Devanagari0 STRING_Devanagari0
STRING_Duployan0
STRING_Egyptian_Hieroglyphs0 STRING_Egyptian_Hieroglyphs0
STRING_Elbasan0
STRING_Ethiopic0 STRING_Ethiopic0
STRING_Georgian0 STRING_Georgian0
STRING_Glagolitic0 STRING_Glagolitic0
STRING_Gothic0 STRING_Gothic0
STRING_Grantha0
STRING_Greek0 STRING_Greek0
STRING_Gujarati0 STRING_Gujarati0
STRING_Gurmukhi0 STRING_Gurmukhi0
@ -410,12 +438,15 @@ const char PRIV(utt_names)[] =
STRING_Kayah_Li0 STRING_Kayah_Li0
STRING_Kharoshthi0 STRING_Kharoshthi0
STRING_Khmer0 STRING_Khmer0
STRING_Khojki0
STRING_Khudawadi0
STRING_L0 STRING_L0
STRING_L_AMPERSAND0 STRING_L_AMPERSAND0
STRING_Lao0 STRING_Lao0
STRING_Latin0 STRING_Latin0
STRING_Lepcha0 STRING_Lepcha0
STRING_Limbu0 STRING_Limbu0
STRING_Linear_A0
STRING_Linear_B0 STRING_Linear_B0
STRING_Lisu0 STRING_Lisu0
STRING_Ll0 STRING_Ll0
@ -426,18 +457,24 @@ const char PRIV(utt_names)[] =
STRING_Lycian0 STRING_Lycian0
STRING_Lydian0 STRING_Lydian0
STRING_M0 STRING_M0
STRING_Mahajani0
STRING_Malayalam0 STRING_Malayalam0
STRING_Mandaic0 STRING_Mandaic0
STRING_Manichaean0
STRING_Mc0 STRING_Mc0
STRING_Me0 STRING_Me0
STRING_Meetei_Mayek0 STRING_Meetei_Mayek0
STRING_Mende_Kikakui0
STRING_Meroitic_Cursive0 STRING_Meroitic_Cursive0
STRING_Meroitic_Hieroglyphs0 STRING_Meroitic_Hieroglyphs0
STRING_Miao0 STRING_Miao0
STRING_Mn0 STRING_Mn0
STRING_Modi0
STRING_Mongolian0 STRING_Mongolian0
STRING_Mro0
STRING_Myanmar0 STRING_Myanmar0
STRING_N0 STRING_N0
STRING_Nabataean0
STRING_Nd0 STRING_Nd0
STRING_New_Tai_Lue0 STRING_New_Tai_Lue0
STRING_Nko0 STRING_Nko0
@ -446,12 +483,17 @@ const char PRIV(utt_names)[] =
STRING_Ogham0 STRING_Ogham0
STRING_Ol_Chiki0 STRING_Ol_Chiki0
STRING_Old_Italic0 STRING_Old_Italic0
STRING_Old_North_Arabian0
STRING_Old_Permic0
STRING_Old_Persian0 STRING_Old_Persian0
STRING_Old_South_Arabian0 STRING_Old_South_Arabian0
STRING_Old_Turkic0 STRING_Old_Turkic0
STRING_Oriya0 STRING_Oriya0
STRING_Osmanya0 STRING_Osmanya0
STRING_P0 STRING_P0
STRING_Pahawh_Hmong0
STRING_Palmyrene0
STRING_Pau_Cin_Hau0
STRING_Pc0 STRING_Pc0
STRING_Pd0 STRING_Pd0
STRING_Pe0 STRING_Pe0
@ -461,6 +503,7 @@ const char PRIV(utt_names)[] =
STRING_Pi0 STRING_Pi0
STRING_Po0 STRING_Po0
STRING_Ps0 STRING_Ps0
STRING_Psalter_Pahlavi0
STRING_Rejang0 STRING_Rejang0
STRING_Runic0 STRING_Runic0
STRING_S0 STRING_S0
@ -469,6 +512,7 @@ const char PRIV(utt_names)[] =
STRING_Sc0 STRING_Sc0
STRING_Sharada0 STRING_Sharada0
STRING_Shavian0 STRING_Shavian0
STRING_Siddham0
STRING_Sinhala0 STRING_Sinhala0
STRING_Sk0 STRING_Sk0
STRING_Sm0 STRING_Sm0
@ -489,8 +533,10 @@ const char PRIV(utt_names)[] =
STRING_Thai0 STRING_Thai0
STRING_Tibetan0 STRING_Tibetan0
STRING_Tifinagh0 STRING_Tifinagh0
STRING_Tirhuta0
STRING_Ugaritic0 STRING_Ugaritic0
STRING_Vai0 STRING_Vai0
STRING_Warang_Citi0
STRING_Xan0 STRING_Xan0
STRING_Xps0 STRING_Xps0
STRING_Xsp0 STRING_Xsp0
@ -509,146 +555,169 @@ const ucp_type_table PRIV(utt)[] = {
{ 20, PT_SC, ucp_Avestan }, { 20, PT_SC, ucp_Avestan },
{ 28, PT_SC, ucp_Balinese }, { 28, PT_SC, ucp_Balinese },
{ 37, PT_SC, ucp_Bamum }, { 37, PT_SC, ucp_Bamum },
{ 43, PT_SC, ucp_Batak }, { 43, PT_SC, ucp_Bassa_Vah },
{ 49, PT_SC, ucp_Bengali }, { 53, PT_SC, ucp_Batak },
{ 57, PT_SC, ucp_Bopomofo }, { 59, PT_SC, ucp_Bengali },
{ 66, PT_SC, ucp_Brahmi }, { 67, PT_SC, ucp_Bopomofo },
{ 73, PT_SC, ucp_Braille }, { 76, PT_SC, ucp_Brahmi },
{ 81, PT_SC, ucp_Buginese }, { 83, PT_SC, ucp_Braille },
{ 90, PT_SC, ucp_Buhid }, { 91, PT_SC, ucp_Buginese },
{ 96, PT_GC, ucp_C }, { 100, PT_SC, ucp_Buhid },
{ 98, PT_SC, ucp_Canadian_Aboriginal }, { 106, PT_GC, ucp_C },
{ 118, PT_SC, ucp_Carian }, { 108, PT_SC, ucp_Canadian_Aboriginal },
{ 125, PT_PC, ucp_Cc }, { 128, PT_SC, ucp_Carian },
{ 128, PT_PC, ucp_Cf }, { 135, PT_SC, ucp_Caucasian_Albanian },
{ 131, PT_SC, ucp_Chakma }, { 154, PT_PC, ucp_Cc },
{ 138, PT_SC, ucp_Cham }, { 157, PT_PC, ucp_Cf },
{ 143, PT_SC, ucp_Cherokee }, { 160, PT_SC, ucp_Chakma },
{ 152, PT_PC, ucp_Cn }, { 167, PT_SC, ucp_Cham },
{ 155, PT_PC, ucp_Co }, { 172, PT_SC, ucp_Cherokee },
{ 158, PT_SC, ucp_Common }, { 181, PT_PC, ucp_Cn },
{ 165, PT_SC, ucp_Coptic }, { 184, PT_PC, ucp_Co },
{ 172, PT_PC, ucp_Cs }, { 187, PT_SC, ucp_Common },
{ 175, PT_SC, ucp_Cuneiform }, { 194, PT_SC, ucp_Coptic },
{ 185, PT_SC, ucp_Cypriot }, { 201, PT_PC, ucp_Cs },
{ 193, PT_SC, ucp_Cyrillic }, { 204, PT_SC, ucp_Cuneiform },
{ 202, PT_SC, ucp_Deseret }, { 214, PT_SC, ucp_Cypriot },
{ 210, PT_SC, ucp_Devanagari }, { 222, PT_SC, ucp_Cyrillic },
{ 221, PT_SC, ucp_Egyptian_Hieroglyphs }, { 231, PT_SC, ucp_Deseret },
{ 242, PT_SC, ucp_Ethiopic }, { 239, PT_SC, ucp_Devanagari },
{ 251, PT_SC, ucp_Georgian }, { 250, PT_SC, ucp_Duployan },
{ 260, PT_SC, ucp_Glagolitic }, { 259, PT_SC, ucp_Egyptian_Hieroglyphs },
{ 271, PT_SC, ucp_Gothic }, { 280, PT_SC, ucp_Elbasan },
{ 278, PT_SC, ucp_Greek }, { 288, PT_SC, ucp_Ethiopic },
{ 284, PT_SC, ucp_Gujarati }, { 297, PT_SC, ucp_Georgian },
{ 293, PT_SC, ucp_Gurmukhi }, { 306, PT_SC, ucp_Glagolitic },
{ 302, PT_SC, ucp_Han }, { 317, PT_SC, ucp_Gothic },
{ 306, PT_SC, ucp_Hangul }, { 324, PT_SC, ucp_Grantha },
{ 313, PT_SC, ucp_Hanunoo }, { 332, PT_SC, ucp_Greek },
{ 321, PT_SC, ucp_Hebrew }, { 338, PT_SC, ucp_Gujarati },
{ 328, PT_SC, ucp_Hiragana }, { 347, PT_SC, ucp_Gurmukhi },
{ 337, PT_SC, ucp_Imperial_Aramaic }, { 356, PT_SC, ucp_Han },
{ 354, PT_SC, ucp_Inherited }, { 360, PT_SC, ucp_Hangul },
{ 364, PT_SC, ucp_Inscriptional_Pahlavi }, { 367, PT_SC, ucp_Hanunoo },
{ 386, PT_SC, ucp_Inscriptional_Parthian }, { 375, PT_SC, ucp_Hebrew },
{ 409, PT_SC, ucp_Javanese }, { 382, PT_SC, ucp_Hiragana },
{ 418, PT_SC, ucp_Kaithi }, { 391, PT_SC, ucp_Imperial_Aramaic },
{ 425, PT_SC, ucp_Kannada }, { 408, PT_SC, ucp_Inherited },
{ 433, PT_SC, ucp_Katakana }, { 418, PT_SC, ucp_Inscriptional_Pahlavi },
{ 442, PT_SC, ucp_Kayah_Li }, { 440, PT_SC, ucp_Inscriptional_Parthian },
{ 451, PT_SC, ucp_Kharoshthi }, { 463, PT_SC, ucp_Javanese },
{ 462, PT_SC, ucp_Khmer }, { 472, PT_SC, ucp_Kaithi },
{ 468, PT_GC, ucp_L }, { 479, PT_SC, ucp_Kannada },
{ 470, PT_LAMP, 0 }, { 487, PT_SC, ucp_Katakana },
{ 473, PT_SC, ucp_Lao }, { 496, PT_SC, ucp_Kayah_Li },
{ 477, PT_SC, ucp_Latin }, { 505, PT_SC, ucp_Kharoshthi },
{ 483, PT_SC, ucp_Lepcha }, { 516, PT_SC, ucp_Khmer },
{ 490, PT_SC, ucp_Limbu }, { 522, PT_SC, ucp_Khojki },
{ 496, PT_SC, ucp_Linear_B }, { 529, PT_SC, ucp_Khudawadi },
{ 505, PT_SC, ucp_Lisu }, { 539, PT_GC, ucp_L },
{ 510, PT_PC, ucp_Ll }, { 541, PT_LAMP, 0 },
{ 513, PT_PC, ucp_Lm }, { 544, PT_SC, ucp_Lao },
{ 516, PT_PC, ucp_Lo }, { 548, PT_SC, ucp_Latin },
{ 519, PT_PC, ucp_Lt }, { 554, PT_SC, ucp_Lepcha },
{ 522, PT_PC, ucp_Lu }, { 561, PT_SC, ucp_Limbu },
{ 525, PT_SC, ucp_Lycian }, { 567, PT_SC, ucp_Linear_A },
{ 532, PT_SC, ucp_Lydian }, { 576, PT_SC, ucp_Linear_B },
{ 539, PT_GC, ucp_M }, { 585, PT_SC, ucp_Lisu },
{ 541, PT_SC, ucp_Malayalam }, { 590, PT_PC, ucp_Ll },
{ 551, PT_SC, ucp_Mandaic }, { 593, PT_PC, ucp_Lm },
{ 559, PT_PC, ucp_Mc }, { 596, PT_PC, ucp_Lo },
{ 562, PT_PC, ucp_Me }, { 599, PT_PC, ucp_Lt },
{ 565, PT_SC, ucp_Meetei_Mayek }, { 602, PT_PC, ucp_Lu },
{ 578, PT_SC, ucp_Meroitic_Cursive }, { 605, PT_SC, ucp_Lycian },
{ 595, PT_SC, ucp_Meroitic_Hieroglyphs }, { 612, PT_SC, ucp_Lydian },
{ 616, PT_SC, ucp_Miao }, { 619, PT_GC, ucp_M },
{ 621, PT_PC, ucp_Mn }, { 621, PT_SC, ucp_Mahajani },
{ 624, PT_SC, ucp_Mongolian }, { 630, PT_SC, ucp_Malayalam },
{ 634, PT_SC, ucp_Myanmar }, { 640, PT_SC, ucp_Mandaic },
{ 642, PT_GC, ucp_N }, { 648, PT_SC, ucp_Manichaean },
{ 644, PT_PC, ucp_Nd }, { 659, PT_PC, ucp_Mc },
{ 647, PT_SC, ucp_New_Tai_Lue }, { 662, PT_PC, ucp_Me },
{ 659, PT_SC, ucp_Nko }, { 665, PT_SC, ucp_Meetei_Mayek },
{ 663, PT_PC, ucp_Nl }, { 678, PT_SC, ucp_Mende_Kikakui },
{ 666, PT_PC, ucp_No }, { 692, PT_SC, ucp_Meroitic_Cursive },
{ 669, PT_SC, ucp_Ogham }, { 709, PT_SC, ucp_Meroitic_Hieroglyphs },
{ 675, PT_SC, ucp_Ol_Chiki }, { 730, PT_SC, ucp_Miao },
{ 684, PT_SC, ucp_Old_Italic }, { 735, PT_PC, ucp_Mn },
{ 695, PT_SC, ucp_Old_Persian }, { 738, PT_SC, ucp_Modi },
{ 707, PT_SC, ucp_Old_South_Arabian }, { 743, PT_SC, ucp_Mongolian },
{ 725, PT_SC, ucp_Old_Turkic }, { 753, PT_SC, ucp_Mro },
{ 736, PT_SC, ucp_Oriya }, { 757, PT_SC, ucp_Myanmar },
{ 742, PT_SC, ucp_Osmanya }, { 765, PT_GC, ucp_N },
{ 750, PT_GC, ucp_P }, { 767, PT_SC, ucp_Nabataean },
{ 752, PT_PC, ucp_Pc }, { 777, PT_PC, ucp_Nd },
{ 755, PT_PC, ucp_Pd }, { 780, PT_SC, ucp_New_Tai_Lue },
{ 758, PT_PC, ucp_Pe }, { 792, PT_SC, ucp_Nko },
{ 761, PT_PC, ucp_Pf }, { 796, PT_PC, ucp_Nl },
{ 764, PT_SC, ucp_Phags_Pa }, { 799, PT_PC, ucp_No },
{ 773, PT_SC, ucp_Phoenician }, { 802, PT_SC, ucp_Ogham },
{ 784, PT_PC, ucp_Pi }, { 808, PT_SC, ucp_Ol_Chiki },
{ 787, PT_PC, ucp_Po }, { 817, PT_SC, ucp_Old_Italic },
{ 790, PT_PC, ucp_Ps }, { 828, PT_SC, ucp_Old_North_Arabian },
{ 793, PT_SC, ucp_Rejang }, { 846, PT_SC, ucp_Old_Permic },
{ 800, PT_SC, ucp_Runic }, { 857, PT_SC, ucp_Old_Persian },
{ 806, PT_GC, ucp_S }, { 869, PT_SC, ucp_Old_South_Arabian },
{ 808, PT_SC, ucp_Samaritan }, { 887, PT_SC, ucp_Old_Turkic },
{ 818, PT_SC, ucp_Saurashtra }, { 898, PT_SC, ucp_Oriya },
{ 829, PT_PC, ucp_Sc }, { 904, PT_SC, ucp_Osmanya },
{ 832, PT_SC, ucp_Sharada }, { 912, PT_GC, ucp_P },
{ 840, PT_SC, ucp_Shavian }, { 914, PT_SC, ucp_Pahawh_Hmong },
{ 848, PT_SC, ucp_Sinhala }, { 927, PT_SC, ucp_Palmyrene },
{ 856, PT_PC, ucp_Sk }, { 937, PT_SC, ucp_Pau_Cin_Hau },
{ 859, PT_PC, ucp_Sm }, { 949, PT_PC, ucp_Pc },
{ 862, PT_PC, ucp_So }, { 952, PT_PC, ucp_Pd },
{ 865, PT_SC, ucp_Sora_Sompeng }, { 955, PT_PC, ucp_Pe },
{ 878, PT_SC, ucp_Sundanese }, { 958, PT_PC, ucp_Pf },
{ 888, PT_SC, ucp_Syloti_Nagri }, { 961, PT_SC, ucp_Phags_Pa },
{ 901, PT_SC, ucp_Syriac }, { 970, PT_SC, ucp_Phoenician },
{ 908, PT_SC, ucp_Tagalog }, { 981, PT_PC, ucp_Pi },
{ 916, PT_SC, ucp_Tagbanwa }, { 984, PT_PC, ucp_Po },
{ 925, PT_SC, ucp_Tai_Le }, { 987, PT_PC, ucp_Ps },
{ 932, PT_SC, ucp_Tai_Tham }, { 990, PT_SC, ucp_Psalter_Pahlavi },
{ 941, PT_SC, ucp_Tai_Viet }, { 1006, PT_SC, ucp_Rejang },
{ 950, PT_SC, ucp_Takri }, { 1013, PT_SC, ucp_Runic },
{ 956, PT_SC, ucp_Tamil }, { 1019, PT_GC, ucp_S },
{ 962, PT_SC, ucp_Telugu }, { 1021, PT_SC, ucp_Samaritan },
{ 969, PT_SC, ucp_Thaana }, { 1031, PT_SC, ucp_Saurashtra },
{ 976, PT_SC, ucp_Thai }, { 1042, PT_PC, ucp_Sc },
{ 981, PT_SC, ucp_Tibetan }, { 1045, PT_SC, ucp_Sharada },
{ 989, PT_SC, ucp_Tifinagh }, { 1053, PT_SC, ucp_Shavian },
{ 998, PT_SC, ucp_Ugaritic }, { 1061, PT_SC, ucp_Siddham },
{ 1007, PT_SC, ucp_Vai }, { 1069, PT_SC, ucp_Sinhala },
{ 1011, PT_ALNUM, 0 }, { 1077, PT_PC, ucp_Sk },
{ 1015, PT_PXSPACE, 0 }, { 1080, PT_PC, ucp_Sm },
{ 1019, PT_SPACE, 0 }, { 1083, PT_PC, ucp_So },
{ 1023, PT_UCNC, 0 }, { 1086, PT_SC, ucp_Sora_Sompeng },
{ 1027, PT_WORD, 0 }, { 1099, PT_SC, ucp_Sundanese },
{ 1031, PT_SC, ucp_Yi }, { 1109, PT_SC, ucp_Syloti_Nagri },
{ 1034, PT_GC, ucp_Z }, { 1122, PT_SC, ucp_Syriac },
{ 1036, PT_PC, ucp_Zl }, { 1129, PT_SC, ucp_Tagalog },
{ 1039, PT_PC, ucp_Zp }, { 1137, PT_SC, ucp_Tagbanwa },
{ 1042, PT_PC, ucp_Zs } { 1146, PT_SC, ucp_Tai_Le },
{ 1153, PT_SC, ucp_Tai_Tham },
{ 1162, PT_SC, ucp_Tai_Viet },
{ 1171, PT_SC, ucp_Takri },
{ 1177, PT_SC, ucp_Tamil },
{ 1183, PT_SC, ucp_Telugu },
{ 1190, PT_SC, ucp_Thaana },
{ 1197, PT_SC, ucp_Thai },
{ 1202, PT_SC, ucp_Tibetan },
{ 1210, PT_SC, ucp_Tifinagh },
{ 1219, PT_SC, ucp_Tirhuta },
{ 1227, PT_SC, ucp_Ugaritic },
{ 1236, PT_SC, ucp_Vai },
{ 1240, PT_SC, ucp_Warang_Citi },
{ 1252, PT_ALNUM, 0 },
{ 1256, PT_PXSPACE, 0 },
{ 1260, PT_SPACE, 0 },
{ 1264, PT_UCNC, 0 },
{ 1268, PT_WORD, 0 },
{ 1272, PT_SC, ucp_Yi },
{ 1275, PT_GC, ucp_Z },
{ 1277, PT_PC, ucp_Zl },
{ 1280, PT_PC, ucp_Zp },
{ 1283, PT_PC, ucp_Zs }
}; };
const int PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table); const int PRIV(utt_size) = sizeof(PRIV(utt)) / sizeof(ucp_type_table);

File diff suppressed because it is too large Load diff

View file

@ -81,6 +81,11 @@ additional data. */
if (c < 256) if (c < 256)
{ {
if ((*data & XCL_HASPROP) == 0)
{
if ((*data & XCL_MAP) == 0) return negated;
return (((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0;
}
if ((*data & XCL_MAP) != 0 && if ((*data & XCL_MAP) != 0 &&
(((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0) (((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0)
return !negated; /* char found */ return !negated; /* char found */

View file

@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language. and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Copyright (c) 1997-2012 University of Cambridge Copyright (c) 1997-2014 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -170,7 +170,10 @@ static const int eint[] = {
REG_BADPAT, /* missing opening brace after \o */ REG_BADPAT, /* missing opening brace after \o */
REG_BADPAT, /* parentheses too deeply nested */ REG_BADPAT, /* parentheses too deeply nested */
REG_BADPAT, /* invalid range in character class */ REG_BADPAT, /* invalid range in character class */
REG_BADPAT /* group name must start with a non-digit */ REG_BADPAT, /* group name must start with a non-digit */
/* 85 */
REG_BADPAT, /* parentheses too deeply nested (stack check) */
REG_BADPAT /* missing digits in \x{} or \o{} */
}; };
/* Table of texts corresponding to POSIX error codes */ /* Table of texts corresponding to POSIX error codes */

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -111,7 +111,7 @@
bababbc bababbc
babababc babababc
/^\ca\cA\c[\c{\c:/ /^\ca\cA\c[;\c:/
\x01\x01\e;z \x01\x01\e;z
/^[ab\]cde]/ /^[ab\]cde]/
@ -4938,6 +4938,12 @@ however, we need the complication for Perl. ---/
/((?(R1)a+|(?1)b))/ /((?(R1)a+|(?1)b))/
aaaabcde aaaabcde
/((?(R)a|(?1)))*/
aaa
/((?(R)a|(?1)))+/
aaa
/a(*:any /a(*:any
name)/K name)/K
abc abc
@ -5666,4 +5672,52 @@ AbcdCBefgBhiBqz
/(a\Kb)*/+ /(a\Kb)*/+
ababc ababc
/(?:x|(?:(xx|yy)+|x|x|x|x|x)|a|a|a)bc/
acb
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
/^\w+(?>\s*)(?<=\w)/
test test
/(?P<same>a)(?P<same>b)/gJ
abbaba
/(?P<same>a)(?P<same>b)(?P=same)/gJ
abbaba
/(?P=same)?(?P<same>a)(?P<same>b)/gJ
abbaba
/(?:(?P=same)?(?:(?P<same>a)|(?P<same>b))(?P=same))+/gJ
bbbaaabaabb
/(?:(?P=same)?(?:(?P=same)(?P<same>a)(?P=same)|(?P=same)?(?P<same>b)(?P=same)){2}(?P=same)(?P<same>c)(?P=same)){2}(?P<same>z)?/gJ
bbbaaaccccaaabbbcc
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
acl
bdl
adl
bcl
/\sabc/
\x{0b}abc
/[\Qa]\E]+/
aa]]
/[\Q]a\E]+/
aa]]
/-- End of testinput1 --/ /-- End of testinput1 --/

View file

@ -132,4 +132,6 @@ is required for these tests. --/
/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/B /abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/B
/(((a\2)|(a*)\g<-1>))*a?/B
/-- End of testinput11 --/ /-- End of testinput11 --/

View file

@ -32,4 +32,10 @@
/[[:blank:]]/WBZ /[[:blank:]]/WBZ
/\x{212a}+/i8SI
KKkk\x{212a}
/s+/i8SI
SSss\x{17f}
/-- End of testinput16 --/ /-- End of testinput16 --/

View file

@ -207,7 +207,7 @@ correctly, but that messes up comparisons). --/
CDBABC CDBABC
\x{2000}ABC \x{2000}ABC
/\R*A/SI8 /\R*A/SI8<bsr_unicode>
CDBABC CDBABC
\x{2028}A \x{2028}A

View file

@ -19,4 +19,10 @@
/[[:blank:]]/WBZ /[[:blank:]]/WBZ
/\x{212a}+/i8SI
KKkk\x{212a}
/s+/i8SI
SSss\x{17f}
/-- End of testinput19 --/ /-- End of testinput19 --/

View file

@ -907,6 +907,9 @@
/\U/I /\U/I
/a{1,3}b/U
ab
/[/I /[/I
/[a-/I /[a-/I
@ -4032,6 +4035,8 @@ backtracking verbs. --/
/(?(R&6yh)abc)/ /(?(R&6yh)abc)/
/(((a\2)|(a*)\g<-1>))*a?/BZ
/-- Test the ugly "start or end of word" compatibility syntax --/ /-- Test the ugly "start or end of word" compatibility syntax --/
/[[:<:]]red[[:>:]]/BZ /[[:<:]]red[[:>:]]/BZ
@ -4045,4 +4050,32 @@ backtracking verbs. --/
/[a[:<:]] should give error/ /[a[:<:]] should give error/
/(?=ab\K)/+
abcd
/abcd/f<lf>
xx\nxabcd
/ -- Test stack check external calls --/
/(((((a)))))/Q0
/(((((a)))))/Q1
/(((((a)))))/Q
/^\w+(?>\s*)(?<=\w)/BZ
/\othing/
/\o{}/
/\o{whatever}/
/\xthing/
/\x{}/
/\x{whatever}/
/-- End of testinput2 --/ /-- End of testinput2 --/

View file

@ -1,6 +1,6 @@
/-- Tests for the 32-bit library only */ /-- Tests for the 32-bit library only */
< forbid 8w < forbid 8W
/-- Check maximum character size --/ /-- Check maximum character size --/

View file

@ -1,6 +1,9 @@
/-- This set of tests checks local-specific features, using the fr_FR locale. /-- This set of tests checks local-specific features, using the "fr_FR" locale.
It is not Perl-compatible. There is different version called wintestinput3 It is not Perl-compatible. When run via RunTest, the locale is edited to
f or use on Windows, where the locale is called "french". --/ be whichever of "fr_FR", "french", or "fr" is found to exist. There is
different version of this file called wintestinput3 for use on Windows,
where the locale is called "french" and the tests are run using
RunTest.bat. --/
< forbid 8W < forbid 8W

View file

@ -716,4 +716,10 @@
/^a+[a\x{200}]/8 /^a+[a\x{200}]/8
aa aa
/^.\B.\B./8
\x{10123}\x{10124}\x{10125}
/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/8
#\x{10000}#\x{100}#\x{10ffff}#
/-- End of testinput4 --/ /-- End of testinput4 --/

View file

@ -788,4 +788,6 @@
/^a+[a\x{200}]/8BZ /^a+[a\x{200}]/8BZ
aa aa
/[b-d\x{200}-\x{250}]*[ae-h]?#[\x{200}-\x{250}]{0,8}[\x00-\xff]*#[\x{200}-\x{250}]+[a-z]/8BZ
/-- End of testinput5 --/ /-- End of testinput5 --/

View file

@ -421,8 +421,8 @@
/^[\p{Arabic}]/8 /^[\p{Arabic}]/8
\x{06e9} \x{06e9}
\x{060b} \x{060b}
\x{061c}
** Failers ** Failers
\x{061c}
X\x{06e9} X\x{06e9}
/^[\P{Yi}]/8 /^[\P{Yi}]/8
@ -1484,4 +1484,16 @@
\x{a1}\x{a7} \x{a1}\x{a7}
\x{37e} \x{37e}
/[RST]+/8iW
Ss\x{17f}
/[R-T]+/8iW
Ss\x{17f}
/[q-u]+/8iW
Ss\x{17f}
/^s?c/mi8
scat
/-- End of testinput6 --/ /-- End of testinput6 --/

View file

@ -829,4 +829,13 @@ of case for anything other than the ASCII letters. --/
/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/8WBZ /\d+\s{0,5}=\s*\S?=\w{0,4}\W*/8WBZ
/[RST]+/8iWBZ
/[R-T]+/8iWBZ
/[Q-U]+/8iWBZ
/^s?c/mi8I
scat
/-- End of testinput7 --/ /-- End of testinput7 --/

View file

@ -4831,4 +4831,10 @@
/[ab]{2,}?/ /[ab]{2,}?/
aaaa aaaa
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
/-- End of testinput8 --/ /-- End of testinput8 --/

View file

@ -223,7 +223,7 @@ No match
babababc babababc
No match No match
/^\ca\cA\c[\c{\c:/ /^\ca\cA\c[;\c:/
\x01\x01\e;z \x01\x01\e;z
0: \x01\x01\x1b;z 0: \x01\x01\x1b;z
@ -8235,6 +8235,16 @@ MK: M
0: aaaab 0: aaaab
1: aaaab 1: aaaab
/((?(R)a|(?1)))*/
aaa
0: aaa
1: a
/((?(R)a|(?1)))+/
aaa
0: aaa
1: a
/a(*:any /a(*:any
name)/K name)/K
abc abc
@ -9313,4 +9323,92 @@ No match
0+ c 0+ c
1: ab 1: ab
/(?:x|(?:(xx|yy)+|x|x|x|x|x)|a|a|a)bc/
acb
No match
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
1: AFTER
2:
/^\w+(?>\s*)(?<=\w)/
test test
0: tes
/(?P<same>a)(?P<same>b)/gJ
abbaba
0: ab
1: a
2: b
0: ab
1: a
2: b
/(?P<same>a)(?P<same>b)(?P=same)/gJ
abbaba
0: aba
1: a
2: b
/(?P=same)?(?P<same>a)(?P<same>b)/gJ
abbaba
0: ab
1: a
2: b
0: ab
1: a
2: b
/(?:(?P=same)?(?:(?P<same>a)|(?P<same>b))(?P=same))+/gJ
bbbaaabaabb
0: bbbaaaba
1: a
2: b
0: bb
1: <unset>
2: b
/(?:(?P=same)?(?:(?P=same)(?P<same>a)(?P=same)|(?P=same)?(?P<same>b)(?P=same)){2}(?P=same)(?P<same>c)(?P=same)){2}(?P<same>z)?/gJ
bbbaaaccccaaabbbcc
No match
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
acl
0: acl
1: a
bdl
0: bdl
1: <unset>
2: b
adl
0: dl
bcl
0: l
/\sabc/
\x{0b}abc
0: \x0babc
/[\Qa]\E]+/
aa]]
0: aa]]
/[\Q]a\E]+/
aa]]
0: aa]]
/-- End of testinput1 --/ /-- End of testinput1 --/

View file

@ -709,4 +709,28 @@ Memory allocation (code space): 14
62 End 62 End
------------------------------------------------------------------ ------------------------------------------------------------------
/(((a\2)|(a*)\g<-1>))*a?/B
------------------------------------------------------------------
0 39 Bra
2 Brazero
3 32 SCBra 1
6 27 Once
8 12 CBra 2
11 7 CBra 3
14 a
16 \2
18 7 Ket
20 11 Alt
22 5 CBra 4
25 a*
27 5 Ket
29 22 Recurse
31 23 Ket
33 27 Ket
35 32 KetRmax
37 a?+
39 39 Ket
41 End
------------------------------------------------------------------
/-- End of testinput11 --/ /-- End of testinput11 --/

View file

@ -709,4 +709,28 @@ Memory allocation (code space): 28
62 End 62 End
------------------------------------------------------------------ ------------------------------------------------------------------
/(((a\2)|(a*)\g<-1>))*a?/B
------------------------------------------------------------------
0 39 Bra
2 Brazero
3 32 SCBra 1
6 27 Once
8 12 CBra 2
11 7 CBra 3
14 a
16 \2
18 7 Ket
20 11 Alt
22 5 CBra 4
25 a*
27 5 Ket
29 22 Recurse
31 23 Ket
33 27 Ket
35 32 KetRmax
37 a?+
39 39 Ket
41 End
------------------------------------------------------------------
/-- End of testinput11 --/ /-- End of testinput11 --/

View file

@ -709,4 +709,28 @@ Memory allocation (code space): 10
76 End 76 End
------------------------------------------------------------------ ------------------------------------------------------------------
/(((a\2)|(a*)\g<-1>))*a?/B
------------------------------------------------------------------
0 57 Bra
3 Brazero
4 48 SCBra 1
9 40 Once
12 18 CBra 2
17 10 CBra 3
22 a
24 \2
27 10 Ket
30 16 Alt
33 7 CBra 4
38 a*
40 7 Ket
43 33 Recurse
46 34 Ket
49 40 Ket
52 48 KetRmax
55 a?+
57 57 Ket
60 End
------------------------------------------------------------------
/-- End of testinput11 --/ /-- End of testinput11 --/

View file

@ -8,7 +8,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'c' Need char = 'c'
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
JIT study was successful JIT study was successful
/(?(?C1)(?=a)a)/S+I /(?(?C1)(?=a)a)/S+I
@ -27,7 +27,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = -1 Subject length lower bound = -1
No set of starting bytes No starting char list
JIT study was not successful JIT study was not successful
/abc/S+I>testsavedregex /abc/S+I>testsavedregex
@ -36,7 +36,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'c' Need char = 'c'
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
JIT study was successful JIT study was successful
Compiled pattern written to testsavedregex Compiled pattern written to testsavedregex
Study data written to testsavedregex Study data written to testsavedregex
@ -165,7 +165,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'd' Need char = 'd'
Subject length lower bound = 4 Subject length lower bound = 4
No set of starting bytes No starting char list
JIT study was successful JIT study was successful
/(*NO_START_OPT)a(*:m)b/KS++ /(*NO_START_OPT)a(*:m)b/KS++

View file

@ -8,7 +8,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'c' Need char = 'c'
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
JIT support is not available in this version of PCRE JIT support is not available in this version of PCRE
/a*/SI /a*/SI

View file

@ -361,7 +361,7 @@ Options: extended
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 Starting chars: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e 9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f
@ -388,7 +388,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xa0 Starting chars: \x09 \x20 \xa0
/\H/SI /\H/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -396,7 +396,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\v/SI /\v/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -404,7 +404,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 Starting chars: \x0a \x0b \x0c \x0d \x85
/\V/SI /\V/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -412,7 +412,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\R/SI /\R/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -420,7 +420,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 Starting chars: \x0a \x0b \x0c \x0d \x85
/[\h]/BZ /[\h]/BZ
------------------------------------------------------------------ ------------------------------------------------------------------

View file

@ -481,7 +481,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
@ -519,7 +519,7 @@ Options: utf
First char = \x{c4} First char = \x{c4}
Need char = \x{80} Need char = \x{80}
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
\x{100}\x{100}\x{100}\x{100\x{100} \x{100}\x{100}\x{100}\x{100\x{100}
0: \x{100}\x{100}\x{100} 0: \x{100}\x{100}\x{100}
@ -539,7 +539,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xc4 Starting chars: x \xc4
/(\x{100}*a|x)/8SDZ /(\x{100}*a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -558,7 +558,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xc4 Starting chars: a x \xc4
/(\x{100}{0,2}a|x)/8SDZ /(\x{100}{0,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -577,7 +577,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xc4 Starting chars: a x \xc4
/(\x{100}{1,2}a|x)/8SDZ /(\x{100}{1,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -597,7 +597,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xc4 Starting chars: x \xc4
/\x{100}/8DZ /\x{100}/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -799,7 +799,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xc2 \xe1 \xe2 \xe3 Starting chars: \x09 \x20 \xc2 \xe1 \xe2 \xe3
ABC\x{09} ABC\x{09}
0: \x{09} 0: \x{09}
ABC\x{20} ABC\x{20}
@ -825,7 +825,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2 Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2
ABC\x{0a} ABC\x{0a}
0: \x{0a} 0: \x{0a}
ABC\x{0b} ABC\x{0b}
@ -845,7 +845,7 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 A \xc2 \xe1 \xe2 \xe3 Starting chars: \x09 \x20 A \xc2 \xe1 \xe2 \xe3
CDBABC CDBABC
0: A 0: A
@ -855,7 +855,7 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2 Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2
/\s?xxx\s/8SI /\s?xxx\s/8SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -863,7 +863,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 x
/\sxxx\s/I8ST1 /\sxxx\s/I8ST1
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -871,7 +871,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 5 Subject length lower bound = 5
Starting byte set: \x09 \x0a \x0c \x0d \x20 \xc2 Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \xc2
AB\x{85}xxx\x{a0}XYZ AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0} 0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ AB\x{a0}xxx\x{85}XYZ
@ -883,15 +883,15 @@ Options: utf
No first char No first char
Need char = ' ' Need char = ' '
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4
\xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3
\xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2
\xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1
\xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
\x{a2} \x{84} \x{a2} \x{84}
0: \x{a2} \x{84} 0: \x{a2} \x{84}
A Z A Z
@ -917,7 +917,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \xe1 Starting chars: \xe1
/\x{1234}+?/iS8I /\x{1234}+?/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -925,7 +925,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \xe1 Starting chars: \xe1
/\x{1234}++/iS8I /\x{1234}++/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -933,7 +933,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \xe1 Starting chars: \xe1
/\x{1234}{2}/iS8I /\x{1234}{2}/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -941,7 +941,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: \xe1 Starting chars: \xe1
/[^\x{c4}]/8DZ /[^\x{c4}]/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -974,7 +974,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \xc2 \xe2 Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2
/\777/8DZ /\777/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------

View file

@ -64,7 +64,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 17 Subject length lower bound = 17
Starting byte set: \xd0 \xd1 Starting chars: \xd0 \xd1
\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
\x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
@ -92,7 +92,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xa0 Starting chars: \x09 \x20 \xa0
/\v/SI /\v/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -100,7 +100,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 Starting chars: \x0a \x0b \x0c \x0d \x85
/\R/SI /\R/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -108,7 +108,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 Starting chars: \x0a \x0b \x0c \x0d \x85
/[[:blank:]]/WBZ /[[:blank:]]/WBZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -118,4 +118,24 @@ Starting byte set: \x0a \x0b \x0c \x0d \x85
End End
------------------------------------------------------------------ ------------------------------------------------------------------
/\x{212a}+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: K k \xe2
KKkk\x{212a}
0: KKkk\x{212a}
/s+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: S s \xc5
SSss\x{17f}
0: SSss\x{17f}
/-- End of testinput16 --/ /-- End of testinput16 --/

View file

@ -228,7 +228,7 @@ Options: extended
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 Starting chars: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e 9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xff f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xff
@ -274,7 +274,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xa0 \xff Starting chars: \x09 \x20 \xa0 \xff
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
0: \x{1680}\x{2000}\x{202f}\x{3000} 0: \x{1680}\x{2000}\x{202f}\x{3000}
\x{3001}\x{2fff}\x{200a}\xa0\x{2000} \x{3001}\x{2fff}\x{200a}\xa0\x{2000}
@ -292,7 +292,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes Starting chars: \x09 \x20 \xa0 \xff
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
0: \x{1680}\x{2000}\x{202f}\x{3000} 0: \x{1680}\x{2000}\x{202f}\x{3000}
\x{3001}\x{2fff}\x{200a}\xa0\x{2000} \x{3001}\x{2fff}\x{200a}\xa0\x{2000}
@ -304,7 +304,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
0: \x{167f}\x{1681}\x{180d}\x{180f} 0: \x{167f}\x{1681}\x{180d}\x{180f}
\x{2000}\x{200a}\x{1fff}\x{200b} \x{2000}\x{200a}\x{1fff}\x{200b}
@ -330,7 +330,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
0: \x{2028}\x{2029} 0: \x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
@ -348,7 +348,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
0: \x{2028}\x{2029} 0: \x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
@ -360,7 +360,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
\x{2028}\x{2029}\x{2027}\x{2030} \x{2028}\x{2029}\x{2027}\x{2030}
0: \x{2027}\x{2030} 0: \x{2027}\x{2030}
\x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86
@ -378,7 +378,7 @@ Options: bsr_unicode
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
0: \x{2028}\x{2029} 0: \x{2028}\x{2029}
\x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d
@ -534,18 +534,18 @@ MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789AB
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
a* a*
[b-\x{200}]?+ [b-\xff\x{100}-\x{200}]?+
a# a#
a*+ a*+
[b-\x{200}]? [b-\xff\x{100}-\x{200}]?
b# b#
[a-f]* [a-f]*+
[g-\x{200}]*+ [g-\xff\x{100}-\x{200}]*+
# #
[g-\x{200}]* [g-\xff\x{100}-\x{200}]*+
[a-c]*+ [a-c]*+
# #
[g-\x{200}]* [g-\xff\x{100}-\x{200}]*
[a-h]*+ [a-h]*+
Ket Ket
End End

View file

@ -339,7 +339,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
@ -378,7 +378,7 @@ Options: utf
First char = \x{100} First char = \x{100}
Need char = \x{100} Need char = \x{100}
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
\x{100}\x{100}\x{100}\x{100\x{100} \x{100}\x{100}\x{100}\x{100\x{100}
0: \x{100}\x{100}\x{100} 0: \x{100}\x{100}\x{100}
@ -398,7 +398,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xff Starting chars: x \xff
/(\x{100}*a|x)/8SDZ /(\x{100}*a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -417,7 +417,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xff Starting chars: a x \xff
/(\x{100}{0,2}a|x)/8SDZ /(\x{100}{0,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -436,7 +436,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xff Starting chars: a x \xff
/(\x{100}{1,2}a|x)/8SDZ /(\x{100}{1,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -456,7 +456,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xff Starting chars: x \xff
/\x{100}/8DZ /\x{100}/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -666,7 +666,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xa0 \xff Starting chars: \x09 \x20 \xa0 \xff
ABC\x{09} ABC\x{09}
0: \x{09} 0: \x{09}
ABC\x{20} ABC\x{20}
@ -692,7 +692,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
ABC\x{0a} ABC\x{0a}
0: \x{0a} 0: \x{0a}
ABC\x{0b} ABC\x{0b}
@ -712,19 +712,19 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 A \xa0 \xff Starting chars: \x09 \x20 A \xa0 \xff
CDBABC CDBABC
0: A 0: A
\x{2000}ABC \x{2000}ABC
0: \x{2000}A 0: \x{2000}A
/\R*A/SI8 /\R*A/SI8<bsr_unicode>
Capturing subpattern count = 0 Capturing subpattern count = 0
Options: utf Options: bsr_unicode utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d A \x85 \xff Starting chars: \x0a \x0b \x0c \x0d A \x85 \xff
CDBABC CDBABC
0: A 0: A
\x{2028}A \x{2028}A
@ -736,7 +736,7 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
/\s?xxx\s/8SI /\s?xxx\s/8SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -744,7 +744,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 x
/\sxxx\s/I8ST1 /\sxxx\s/I8ST1
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -752,7 +752,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 5 Subject length lower bound = 5
Starting byte set: \x09 \x0a \x0c \x0d \x20 \x85 \xa0 Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
AB\x{85}xxx\x{a0}XYZ AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0} 0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ AB\x{a0}xxx\x{85}XYZ
@ -764,20 +764,20 @@ Options: utf
No first char No first char
Need char = ' ' Need char = ' '
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84
\x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94
\x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4
\xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3
\xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2
\xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1
\xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0
\xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef
\xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe
\xfe \xff \xff
\x{a2} \x{84} \x{a2} \x{84}
0: \x{a2} \x{84} 0: \x{a2} \x{84}
A Z A Z
@ -803,7 +803,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\x{1234}+?/iS8I /\x{1234}+?/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -811,7 +811,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\x{1234}++/iS8I /\x{1234}++/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -819,7 +819,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\x{1234}{2}/iS8I /\x{1234}{2}/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -827,7 +827,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
Need char = \x{1234} Need char = \x{1234}
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
/[^\x{c4}]/8DZ /[^\x{c4}]/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -860,7 +860,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
/-- Check bad offset --/ /-- Check bad offset --/

View file

@ -337,7 +337,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
@ -376,7 +376,7 @@ Options: utf
First char = \x{100} First char = \x{100}
Need char = \x{100} Need char = \x{100}
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
\x{100}\x{100}\x{100}\x{100\x{100} \x{100}\x{100}\x{100}\x{100\x{100}
0: \x{100}\x{100}\x{100} 0: \x{100}\x{100}\x{100}
@ -396,7 +396,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xff Starting chars: x \xff
/(\x{100}*a|x)/8SDZ /(\x{100}*a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -415,7 +415,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xff Starting chars: a x \xff
/(\x{100}{0,2}a|x)/8SDZ /(\x{100}{0,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -434,7 +434,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a x \xff Starting chars: a x \xff
/(\x{100}{1,2}a|x)/8SDZ /(\x{100}{1,2}a|x)/8SDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -454,7 +454,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x \xff Starting chars: x \xff
/\x{100}/8DZ /\x{100}/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -663,7 +663,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 \xa0 \xff Starting chars: \x09 \x20 \xa0 \xff
ABC\x{09} ABC\x{09}
0: \x{09} 0: \x{09}
ABC\x{20} ABC\x{20}
@ -689,7 +689,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
ABC\x{0a} ABC\x{0a}
0: \x{0a} 0: \x{0a}
ABC\x{0b} ABC\x{0b}
@ -709,19 +709,19 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x20 A \xa0 \xff Starting chars: \x09 \x20 A \xa0 \xff
CDBABC CDBABC
0: A 0: A
\x{2000}ABC \x{2000}ABC
0: \x{2000}A 0: \x{2000}A
/\R*A/SI8 /\R*A/SI8<bsr_unicode>
Capturing subpattern count = 0 Capturing subpattern count = 0
Options: utf Options: bsr_unicode utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d A \x85 \xff Starting chars: \x0a \x0b \x0c \x0d A \x85 \xff
CDBABC CDBABC
0: A 0: A
\x{2028}A \x{2028}A
@ -733,7 +733,7 @@ Options: utf
No first char No first char
Need char = 'A' Need char = 'A'
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
/\s?xxx\s/8SI /\s?xxx\s/8SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -741,7 +741,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 x Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 x
/\sxxx\s/I8ST1 /\sxxx\s/I8ST1
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -749,7 +749,7 @@ Options: utf
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 5 Subject length lower bound = 5
Starting byte set: \x09 \x0a \x0c \x0d \x20 \x85 \xa0 Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
AB\x{85}xxx\x{a0}XYZ AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0} 0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ AB\x{a0}xxx\x{85}XYZ
@ -761,20 +761,20 @@ Options: utf
No first char No first char
Need char = ' ' Need char = ' '
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84
\x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94
\x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4
\xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3
\xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2
\xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1
\xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0
\xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef
\xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe
\xfe \xff \xff
\x{a2} \x{84} \x{a2} \x{84}
0: \x{a2} \x{84} 0: \x{a2} \x{84}
A Z A Z
@ -800,7 +800,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\x{1234}+?/iS8I /\x{1234}+?/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -808,7 +808,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\x{1234}++/iS8I /\x{1234}++/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -816,7 +816,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/\x{1234}{2}/iS8I /\x{1234}{2}/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -824,7 +824,7 @@ Options: caseless utf
First char = \x{1234} First char = \x{1234}
Need char = \x{1234} Need char = \x{1234}
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
/[^\x{c4}]/8DZ /[^\x{c4}]/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -857,7 +857,7 @@ Options: utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x0a \x0b \x0c \x0d \x85 \xff Starting chars: \x0a \x0b \x0c \x0d \x85 \xff
/-- Check bad offset --/ /-- Check bad offset --/

View file

@ -55,7 +55,7 @@ Options: caseless utf
First char = \x{401} (caseless) First char = \x{401} (caseless)
Need char = \x{42f} (caseless) Need char = \x{42f} (caseless)
Subject length lower bound = 17 Subject length lower bound = 17
No set of starting bytes No starting char list
\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
\x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
@ -85,4 +85,24 @@ No set of starting bytes
End End
------------------------------------------------------------------ ------------------------------------------------------------------
/\x{212a}+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: K k \xff
KKkk\x{212a}
0: KKkk\x{212a}
/s+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: S s \xff
SSss\x{17f}
0: SSss\x{17f}
/-- End of testinput19 --/ /-- End of testinput19 --/

View file

@ -178,7 +178,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: c d e Starting chars: c d e
this sentence eventually mentions a cat this sentence eventually mentions a cat
0: cat 0: cat
this sentences rambles on and on for a while and then reaches elephant this sentences rambles on and on for a while and then reaches elephant
@ -190,7 +190,7 @@ Options: caseless
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: C D E c d e Starting chars: C D E c d e
this sentence eventually mentions a CAT cat this sentence eventually mentions a CAT cat
0: CAT 0: CAT
this sentences rambles on and on for a while to elephant ElePhant this sentences rambles on and on for a while to elephant ElePhant
@ -202,7 +202,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b c d Starting chars: a b c d
/(a|[^\dZ])/IS /(a|[^\dZ])/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -210,7 +210,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = >
? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y [ \ ] ^ _ ` a b c d ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y [ \ ] ^ _ ` a b c d
@ -231,7 +231,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 a b Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 a b
/(ab\2)/ /(ab\2)/
Failed: reference to non-existent subpattern at offset 6 Failed: reference to non-existent subpattern at offset 6
@ -512,7 +512,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b c d Starting chars: a b c d
/(?i)[abcd]/IS /(?i)[abcd]/IS
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -520,7 +520,7 @@ Options: caseless
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: A B C D a b c d Starting chars: A B C D a b c d
/(?m)[xy]|(b|c)/IS /(?m)[xy]|(b|c)/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -528,7 +528,7 @@ Options: multiline
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: b c x y Starting chars: b c x y
/(^a|^b)/Im /(^a|^b)/Im
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -591,7 +591,7 @@ No options
First char = 'b' (caseless) First char = 'b' (caseless)
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/(a*b|(?i:c*(?-i)d))/IS /(a*b|(?i:c*(?-i)d))/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -599,7 +599,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: C a b c d Starting chars: C a b c d
/a$/I /a$/I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -666,7 +666,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b Starting chars: a b
/(?<!foo)(alpha|omega)/IS /(?<!foo)(alpha|omega)/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -675,7 +675,7 @@ No options
No first char No first char
Need char = 'a' Need char = 'a'
Subject length lower bound = 5 Subject length lower bound = 5
Starting byte set: a o Starting chars: a o
/(?!alphabet)[ab]/IS /(?!alphabet)[ab]/IS
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -683,7 +683,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b Starting chars: a b
/(?<=foo\n)^bar/Im /(?<=foo\n)^bar/Im
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -1642,7 +1642,7 @@ Options: anchored
No first char No first char
Need char = 'd' Need char = 'd'
Subject length lower bound = 4 Subject length lower bound = 4
No set of starting bytes No starting char list
/\( # ( at start /\( # ( at start
(?: # Non-capturing bracket (?: # Non-capturing bracket
@ -1875,7 +1875,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Starting chars: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
_ a b c d e f g h i j k l m n o p q r s t u v w x y z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
/^[[:ascii:]]/DZ /^[[:ascii:]]/DZ
@ -1937,7 +1937,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 Starting chars: \x09 \x0a \x0b \x0c \x0d \x20
/^[[:cntrl:]]/DZ /^[[:cntrl:]]/DZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -3178,6 +3178,10 @@ Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
/\U/I /\U/I
Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1 Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
/a{1,3}b/U
ab
0: ab
/[/I /[/I
Failed: missing terminating ] for character class at offset 1 Failed: missing terminating ] for character class at offset 1
@ -3434,7 +3438,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b Starting chars: a b
/[^a]/I /[^a]/I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -3454,7 +3458,7 @@ No options
No first char No first char
Need char = '6' Need char = '6'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: 0 1 2 3 4 5 6 7 8 9 Starting chars: 0 1 2 3 4 5 6 7 8 9
/a^b/I /a^b/I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -3488,7 +3492,7 @@ Options: caseless
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: A B a b Starting chars: A B a b
/[ab](?i)cd/IS /[ab](?i)cd/IS
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -3496,7 +3500,7 @@ No options
No first char No first char
Need char = 'd' (caseless) Need char = 'd' (caseless)
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: a b Starting chars: a b
/abc(?C)def/I /abc(?C)def/I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -3537,7 +3541,7 @@ No options
No first char No first char
Need char = 'f' Need char = 'f'
Subject length lower bound = 7 Subject length lower bound = 7
Starting byte set: 0 1 2 3 4 5 6 7 8 9 Starting chars: 0 1 2 3 4 5 6 7 8 9
1234abcdef 1234abcdef
--->1234abcdef --->1234abcdef
1 ^ \d 1 ^ \d
@ -3856,7 +3860,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b Starting chars: a b
/(?R)/I /(?R)/I
Failed: recursive call could loop indefinitely at offset 3 Failed: recursive call could loop indefinitely at offset 3
@ -4637,7 +4641,7 @@ Options: caseless
No first char No first char
Need char = 'g' (caseless) Need char = 'g' (caseless)
Subject length lower bound = 8 Subject length lower bound = 8
No set of starting bytes No starting char list
Baby Bjorn Active Carrier - With free SHIPPING!! Baby Bjorn Active Carrier - With free SHIPPING!!
0: Baby Bjorn Active Carrier - With free SHIPPING!! 0: Baby Bjorn Active Carrier - With free SHIPPING!!
1: Baby Bjorn Active Carrier - With free SHIPPING!! 1: Baby Bjorn Active Carrier - With free SHIPPING!!
@ -4656,7 +4660,7 @@ No options
No first char No first char
Need char = 'b' Need char = 'b'
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/(a|b)*.?c/ISDZ /(a|b)*.?c/ISDZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -4677,7 +4681,7 @@ No options
No first char No first char
Need char = 'c' Need char = 'c'
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/abc(?C255)de(?C)f/DZ /abc(?C255)de(?C)f/DZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -4750,7 +4754,7 @@ Options:
No first char No first char
Need char = 'b' Need char = 'b'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b Starting chars: a b
ab ab
--->ab --->ab
+0 ^ a* +0 ^ a*
@ -4893,7 +4897,7 @@ Options:
No first char No first char
Need char = 'x' Need char = 'x'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: a d Starting chars: a d
abcx abcx
--->abcx --->abcx
+0 ^ (abc|def) +0 ^ (abc|def)
@ -5127,7 +5131,7 @@ Options:
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a b x Starting chars: a b x
Note: that { does NOT introduce a quantifier Note: that { does NOT introduce a quantifier
--->Note: that { does NOT introduce a quantifier --->Note: that { does NOT introduce a quantifier
+0 ^ ([ab]{,4}c|xy) +0 ^ ([ab]{,4}c|xy)
@ -5607,7 +5611,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'c' Need char = 'c'
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
Compiled pattern written to testsavedregex Compiled pattern written to testsavedregex
Study data written to testsavedregex Study data written to testsavedregex
<testsavedregex <testsavedregex
@ -5642,7 +5646,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'c' Need char = 'c'
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
Compiled pattern written to testsavedregex Compiled pattern written to testsavedregex
Study data written to testsavedregex Study data written to testsavedregex
<testsavedregex <testsavedregex
@ -5677,7 +5681,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b Starting chars: a b
Compiled pattern written to testsavedregex Compiled pattern written to testsavedregex
Study data written to testsavedregex Study data written to testsavedregex
<testsavedregex <testsavedregex
@ -5716,7 +5720,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b Starting chars: a b
Compiled pattern written to testsavedregex Compiled pattern written to testsavedregex
Study data written to testsavedregex Study data written to testsavedregex
<testsavedregex <testsavedregex
@ -5817,13 +5821,13 @@ No match
No match No match
/a{11111111111111111111}/I /a{11111111111111111111}/I
Failed: number too big in {} quantifier at offset 22 Failed: number too big in {} quantifier at offset 8
/(){64294967295}/I /(){64294967295}/I
Failed: number too big in {} quantifier at offset 14 Failed: number too big in {} quantifier at offset 9
/(){2,4294967295}/I /(){2,4294967295}/I
Failed: number too big in {} quantifier at offset 15 Failed: number too big in {} quantifier at offset 11
"(?i:a)(?i:b)(?i:c)(?i:d)(?i:e)(?i:f)(?i:g)(?i:h)(?i:i)(?i:j)(k)(?i:l)A\1B"I "(?i:a)(?i:b)(?i:c)(?i:d)(?i:e)(?i:f)(?i:g)(?i:h)(?i:i)(?i:j)(k)(?i:l)A\1B"I
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6431,7 +6435,7 @@ No options
No first char No first char
Need char = ',' Need char = ','
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: \x09 \x0a \x0b \x0c \x0d \x20 , Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 ,
\x0b,\x0b \x0b,\x0b
0: \x0b,\x0b 0: \x0b,\x0b
\x0c,\x0d \x0c,\x0d
@ -6738,7 +6742,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: C a b c d Starting chars: C a b c d
/()[ab]xyz/IS /()[ab]xyz/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6746,7 +6750,7 @@ No options
No first char No first char
Need char = 'z' Need char = 'z'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: a b Starting chars: a b
/(|)[ab]xyz/IS /(|)[ab]xyz/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6754,7 +6758,7 @@ No options
No first char No first char
Need char = 'z' Need char = 'z'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: a b Starting chars: a b
/(|c)[ab]xyz/IS /(|c)[ab]xyz/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6762,7 +6766,7 @@ No options
No first char No first char
Need char = 'z' Need char = 'z'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: a b c Starting chars: a b c
/(|c?)[ab]xyz/IS /(|c?)[ab]xyz/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6770,7 +6774,7 @@ No options
No first char No first char
Need char = 'z' Need char = 'z'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: a b c Starting chars: a b c
/(d?|c?)[ab]xyz/IS /(d?|c?)[ab]xyz/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6778,7 +6782,7 @@ No options
No first char No first char
Need char = 'z' Need char = 'z'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: a b c d Starting chars: a b c d
/(d?|c)[ab]xyz/IS /(d?|c)[ab]xyz/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6786,7 +6790,7 @@ No options
No first char No first char
Need char = 'z' Need char = 'z'
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: a b c d Starting chars: a b c d
/^a*b\d/DZ /^a*b\d/DZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -6879,7 +6883,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b c d Starting chars: a b c d
/(a+|b*)[cd]/IS /(a+|b*)[cd]/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6887,7 +6891,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b c d Starting chars: a b c d
/(a*|b+)[cd]/IS /(a*|b+)[cd]/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6895,7 +6899,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b c d Starting chars: a b c d
/(a+|b+)[cd]/IS /(a+|b+)[cd]/IS
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -6903,7 +6907,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a b Starting chars: a b
/(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( /((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
@ -9307,7 +9311,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: x y z Starting chars: x y z
/(?(?=.*b)b|^)/CI /(?(?=.*b)b|^)/CI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -10096,7 +10100,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a b Starting chars: a b
/(a|bc)\1{2,3}/SI /(a|bc)\1{2,3}/SI
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -10105,7 +10109,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: a b Starting chars: a b
/(a|bc)(?1)/SI /(a|bc)(?1)/SI
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -10113,7 +10117,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a b Starting chars: a b
/(a|b\1)(a|b\1)/SI /(a|b\1)(a|b\1)/SI
Capturing subpattern count = 2 Capturing subpattern count = 2
@ -10122,7 +10126,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a b Starting chars: a b
/(a|b\1){2}/SI /(a|b\1){2}/SI
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -10131,7 +10135,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a b Starting chars: a b
/(a|bbbb\1)(a|bbbb\1)/SI /(a|bbbb\1)(a|bbbb\1)/SI
Capturing subpattern count = 2 Capturing subpattern count = 2
@ -10140,7 +10144,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a b Starting chars: a b
/(a|bbbb\1){2}/SI /(a|bbbb\1){2}/SI
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -10149,7 +10153,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a b Starting chars: a b
/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/SI /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/SI
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -10157,7 +10161,7 @@ Options: anchored
No first char No first char
Need char = ':' Need char = ':'
Subject length lower bound = 22 Subject length lower bound = 22
No set of starting bytes No starting char list
/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/isIS /<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/isIS
Capturing subpattern count = 11 Capturing subpattern count = 11
@ -10165,7 +10169,7 @@ Options: caseless dotall
First char = '<' First char = '<'
Need char = '>' Need char = '>'
Subject length lower bound = 47 Subject length lower bound = 47
No set of starting bytes No starting char list
"(?>.*/)foo"SI "(?>.*/)foo"SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -10173,7 +10177,7 @@ No options
No first char No first char
Need char = 'o' Need char = 'o'
Subject length lower bound = 4 Subject length lower bound = 4
No set of starting bytes No starting char list
/(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /xSI /(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /xSI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -10181,7 +10185,7 @@ Options: extended
No first char No first char
Need char = '-' Need char = '-'
Subject length lower bound = 8 Subject length lower bound = 8
No set of starting bytes No starting char list
/(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))/iSI /(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))/iSI
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -10189,7 +10193,7 @@ Options: caseless
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: A B C a b c Starting chars: A B C a b c
/(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))/SI /(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))/SI
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -10197,7 +10201,7 @@ No options
No first char No first char
Need char = 'b' Need char = 'b'
Subject length lower bound = 41 Subject length lower bound = 41
Starting byte set: c d Starting chars: c d
/<a[\s]+href[\s]*=[\s]* # find <a href= /<a[\s]+href[\s]*=[\s]* # find <a href=
([\"\'])? # find single or double quote ([\"\'])? # find single or double quote
@ -10210,7 +10214,7 @@ Options: caseless extended dotall
First char = '<' First char = '<'
Need char = '=' Need char = '='
Subject length lower bound = 9 Subject length lower bound = 9
No set of starting bytes No starting char list
/^(?!:) # colon disallowed at start /^(?!:) # colon disallowed at start
(?: # start of item (?: # start of item
@ -10226,7 +10230,7 @@ Options: anchored caseless extended
No first char No first char
Need char = ':' Need char = ':'
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
/(?|(?<a>A)|(?<a>B))/I /(?|(?<a>A)|(?<a>B))/I
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -10450,7 +10454,7 @@ Options:
No first char No first char
Need char = 'a' Need char = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
cat cat
0: a 0: a
1: 1:
@ -10464,7 +10468,7 @@ No options
No first char No first char
Need char = 'a' Need char = 'a'
Subject length lower bound = 3 Subject length lower bound = 3
No set of starting bytes No starting char list
cat cat
No match No match
@ -10476,7 +10480,7 @@ No options
First char = 'i' First char = 'i'
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
i i
0: i 0: i
@ -10486,7 +10490,7 @@ No options
No first char No first char
Need char = 'i' Need char = 'i'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: i Starting chars: i
ia ia
0: ia 0: ia
1: 1:
@ -11080,7 +11084,7 @@ No options
First char = 'a' First char = 'a'
Need char = '4' Need char = '4'
Subject length lower bound = 5 Subject length lower bound = 5
No set of starting bytes No starting char list
/([abc])++1234/SI /([abc])++1234/SI
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -11088,7 +11092,7 @@ No options
No first char No first char
Need char = '4' Need char = '4'
Subject length lower bound = 5 Subject length lower bound = 5
Starting byte set: a b c Starting chars: a b c
/(?<=(abc)+)X/ /(?<=(abc)+)X/
Failed: lookbehind assertion is not fixed length at offset 10 Failed: lookbehind assertion is not fixed length at offset 10
@ -11369,7 +11373,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/(a(?2)|b)(b(?1)|a)(?:(?1)|(?2))/SI /(a(?2)|b)(b(?1)|a)(?:(?1)|(?2))/SI
Capturing subpattern count = 2 Capturing subpattern count = 2
@ -11377,7 +11381,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: a b Starting chars: a b
/(a(?2)|b)(b(?1)|a)(?1)(?2)/SI /(a(?2)|b)(b(?1)|a)(?1)(?2)/SI
Capturing subpattern count = 2 Capturing subpattern count = 2
@ -11385,7 +11389,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 4 Subject length lower bound = 4
Starting byte set: a b Starting chars: a b
/(abc)(?1)/SI /(abc)(?1)/SI
Capturing subpattern count = 1 Capturing subpattern count = 1
@ -11393,7 +11397,7 @@ No options
First char = 'a' First char = 'a'
Need char = 'c' Need char = 'c'
Subject length lower bound = 6 Subject length lower bound = 6
No set of starting bytes No starting char list
/^(?>a)++/ /^(?>a)++/
aa\M aa\M
@ -11711,7 +11715,7 @@ No options
First char = 't' First char = 't'
Need char = 't' Need char = 't'
Subject length lower bound = 18 Subject length lower bound = 18
No set of starting bytes No starting char list
/\btype\b\W*?\btext\b\W*?\bjavascript\b|\burl\b\W*?\bshell:|<input\b.*?\btype\b\W*?\bimage\b|\bonkeyup\b\W*?\=/IS /\btype\b\W*?\btext\b\W*?\bjavascript\b|\burl\b\W*?\bshell:|<input\b.*?\btype\b\W*?\bimage\b|\bonkeyup\b\W*?\=/IS
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -11720,7 +11724,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 8 Subject length lower bound = 8
Starting byte set: < o t u Starting chars: < o t u
/a(*SKIP)c|b(*ACCEPT)|/+S!I /a(*SKIP)c|b(*ACCEPT)|/+S!I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -11729,7 +11733,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = -1 Subject length lower bound = -1
No set of starting bytes No starting char list
a a
0: 0:
0+ 0+
@ -11740,7 +11744,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = -1 Subject length lower bound = -1
Starting byte set: a b x Starting chars: a b x
ax ax
0: x 0: x
@ -12436,7 +12440,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = -1 Subject length lower bound = -1
No set of starting bytes No starting char list
/(?:(a)+(?C1)bb|aa(?C2)b)/ /(?:(a)+(?C1)bb|aa(?C2)b)/
aab\C+ aab\C+
@ -12722,7 +12726,7 @@ No options
No first char No first char
Need char = 'z' Need char = 'z'
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a z Starting chars: a z
aaaaaaaaaaaaaz aaaaaaaaaaaaaz
Error -21 (recursion limit exceeded) Error -21 (recursion limit exceeded)
aaaaaaaaaaaaaz\Q1000 aaaaaaaaaaaaaz\Q1000
@ -12735,7 +12739,7 @@ No options
No first char No first char
Need char = 'z' Need char = 'z'
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a z Starting chars: a z
aaaaaaaaaaaaaz aaaaaaaaaaaaaz
Error -21 (recursion limit exceeded) Error -21 (recursion limit exceeded)
@ -12746,7 +12750,7 @@ No options
No first char No first char
Need char = 'z' Need char = 'z'
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a z Starting chars: a z
aaaaaaaaaaaaaz aaaaaaaaaaaaaz
No match No match
aaaaaaaaaaaaaz\Q10 aaaaaaaaaaaaaz\Q10
@ -12790,7 +12794,7 @@ Options: dupnames
First char = 'a' First char = 'a'
Need char = 'z' Need char = 'z'
Subject length lower bound = 5 Subject length lower bound = 5
No set of starting bytes No starting char list
/a*[bcd]/BZ /a*[bcd]/BZ
------------------------------------------------------------------ ------------------------------------------------------------------
@ -13902,7 +13906,7 @@ No options
No first char No first char
Need char = 'd' Need char = 'd'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b c d Starting chars: a b c d
/[a-c]+d/DZS /[a-c]+d/DZS
------------------------------------------------------------------ ------------------------------------------------------------------
@ -13917,7 +13921,7 @@ No options
No first char No first char
Need char = 'd' Need char = 'd'
Subject length lower bound = 2 Subject length lower bound = 2
Starting byte set: a b c Starting chars: a b c
/[a-c]?d/DZS /[a-c]?d/DZS
------------------------------------------------------------------ ------------------------------------------------------------------
@ -13932,7 +13936,7 @@ No options
No first char No first char
Need char = 'd' Need char = 'd'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b c d Starting chars: a b c d
/[a-c]{4,6}d/DZS /[a-c]{4,6}d/DZS
------------------------------------------------------------------ ------------------------------------------------------------------
@ -13947,7 +13951,7 @@ No options
No first char No first char
Need char = 'd' Need char = 'd'
Subject length lower bound = 5 Subject length lower bound = 5
Starting byte set: a b c Starting chars: a b c
/[a-c]{0,6}d/DZS /[a-c]{0,6}d/DZS
------------------------------------------------------------------ ------------------------------------------------------------------
@ -13962,7 +13966,7 @@ No options
No first char No first char
Need char = 'd' Need char = 'd'
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: a b c d Starting chars: a b c d
/-- End of special auto-possessive tests --/ /-- End of special auto-possessive tests --/
@ -14089,6 +14093,30 @@ Failed: malformed number or name after (?( at offset 4
/(?(R&6yh)abc)/ /(?(R&6yh)abc)/
Failed: group name must start with a non-digit at offset 5 Failed: group name must start with a non-digit at offset 5
/(((a\2)|(a*)\g<-1>))*a?/BZ
------------------------------------------------------------------
Bra
Brazero
SCBra 1
Once
CBra 2
CBra 3
a
\2
Ket
Alt
CBra 4
a*
Ket
Recurse
Ket
Ket
KetRmax
a?+
Ket
End
------------------------------------------------------------------
/-- Test the ugly "start or end of word" compatibility syntax --/ /-- Test the ugly "start or end of word" compatibility syntax --/
/[[:<:]]red[[:>:]]/BZ /[[:<:]]red[[:>:]]/BZ
@ -14125,4 +14153,57 @@ No match
/[a[:<:]] should give error/ /[a[:<:]] should give error/
Failed: unknown POSIX class name at offset 4 Failed: unknown POSIX class name at offset 4
/(?=ab\K)/+
abcd
Start of matched string is beyond its end - displaying from end to start.
0: ab
0+ abcd
/abcd/f<lf>
xx\nxabcd
No match
/ -- Test stack check external calls --/
/(((((a)))))/Q0
/(((((a)))))/Q1
Failed: parentheses are too deeply nested (stack check) at offset 0
/(((((a)))))/Q
** Missing 0 or 1 after /Q
/^\w+(?>\s*)(?<=\w)/BZ
------------------------------------------------------------------
Bra
^
\w+
Once_NC
\s*+
Ket
AssertB
Reverse
\w
Ket
Ket
End
------------------------------------------------------------------
/\othing/
Failed: missing opening brace after \o at offset 1
/\o{}/
Failed: digits missing in \x{} or \o{} at offset 1
/\o{whatever}/
Failed: non-octal character in \o{} (closing brace missing?) at offset 3
/\xthing/
/\x{}/
Failed: digits missing in \x{} or \o{} at offset 3
/\x{whatever}/
Failed: non-hex character in \x{} (closing brace missing?) at offset 3
/-- End of testinput2 --/ /-- End of testinput2 --/

View file

@ -50,7 +50,7 @@ Options: anchored extended
No first char No first char
No need char No need char
Subject length lower bound = 6 Subject length lower bound = 6
No set of starting bytes No starting char list
<!testsaved16BE-1 <!testsaved16BE-1
Compiled pattern loaded from testsaved16BE-1 Compiled pattern loaded from testsaved16BE-1
@ -83,7 +83,7 @@ Options: anchored extended
No first char No first char
No need char No need char
Subject length lower bound = 6 Subject length lower bound = 6
No set of starting bytes No starting char list
<!testsaved32LE-1 <!testsaved32LE-1
Compiled pattern loaded from testsaved32LE-1 Compiled pattern loaded from testsaved32LE-1

View file

@ -62,7 +62,7 @@ Options: anchored extended
No first char No first char
No need char No need char
Subject length lower bound = 6 Subject length lower bound = 6
No set of starting bytes No starting char list
<!testsaved32BE-1 <!testsaved32BE-1
Compiled pattern loaded from testsaved32BE-1 Compiled pattern loaded from testsaved32BE-1
@ -95,6 +95,6 @@ Options: anchored extended
No first char No first char
No need char No need char
Subject length lower bound = 6 Subject length lower bound = 6
No set of starting bytes No starting char list
/-- End of testinput21 --/ /-- End of testinput21 --/

View file

@ -37,7 +37,7 @@ Options: extended utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
<!testsaved16BE-2 <!testsaved16BE-2
Compiled pattern loaded from testsaved16BE-2 Compiled pattern loaded from testsaved16BE-2
@ -64,7 +64,7 @@ Options: extended utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
<!testsaved32LE-2 <!testsaved32LE-2
Compiled pattern loaded from testsaved32LE-2 Compiled pattern loaded from testsaved32LE-2

View file

@ -49,7 +49,7 @@ Options: extended utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
<!testsaved32BE-2 <!testsaved32BE-2
Compiled pattern loaded from testsaved32BE-2 Compiled pattern loaded from testsaved32BE-2
@ -76,6 +76,6 @@ Options: extended utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
/-- End of testinput22 --/ /-- End of testinput22 --/

View file

@ -18,7 +18,7 @@ Failed: character value in \x{} or \o{} is too large at offset 8
/[\H]/BZSI /[\H]/BZSI
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[\x00-\x08\x0a-\x1f!-\x9f\x{a1}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{ffff}] [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{ffff}]
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -27,12 +27,25 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b
\x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a
\x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9
: ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^
_ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80
\x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f
\x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e
\x9f \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae
\xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd
\xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc
\xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb
\xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea
\xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9
\xfa \xfb \xfc \xfd \xfe \xff
/[\V]/BZSI /[\V]/BZSI
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[\x00-\x09\x0e-\x84\x{86}-\x{2027}\x{202a}-\x{ffff}] [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{ffff}]
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -41,6 +54,19 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c
d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82
\x83 \x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92
\x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1
\xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0
\xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf
\xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce
\xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd
\xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec
\xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb
\xfc \xfd \xfe \xff
/-- End of testinput23 --/ /-- End of testinput23 --/

View file

@ -1,6 +1,6 @@
/-- Tests for the 32-bit library only */ /-- Tests for the 32-bit library only */
< forbid 8w < forbid 8W
/-- Check maximum character size --/ /-- Check maximum character size --/
@ -65,7 +65,7 @@ Need char = \x{800000}
/[\H]/BZSI /[\H]/BZSI
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[\x00-\x08\x0a-\x1f!-\x9f\x{a1}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{ffffffff}] [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{ffffffff}]
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -74,12 +74,25 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b
\x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a
\x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9
: ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^
_ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80
\x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f
\x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e
\x9f \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae
\xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd
\xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc
\xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb
\xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea
\xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9
\xfa \xfb \xfc \xfd \xfe \xff
/[\V]/BZSI /[\V]/BZSI
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[\x00-\x09\x0e-\x84\x{86}-\x{2027}\x{202a}-\x{ffffffff}] [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{ffffffff}]
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -88,6 +101,19 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c
d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82
\x83 \x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92
\x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1
\xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0
\xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf
\xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce
\xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd
\xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec
\xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb
\xfc \xfd \xfe \xff
/-- End of testinput25 --/ /-- End of testinput25 --/

View file

@ -1,6 +1,9 @@
/-- This set of tests checks local-specific features, using the fr_FR locale. /-- This set of tests checks local-specific features, using the "fr_FR" locale.
It is not Perl-compatible. There is different version called wintestinput3 It is not Perl-compatible. When run via RunTest, the locale is edited to
f or use on Windows, where the locale is called "french". --/ be whichever of "fr_FR", "french", or "fr" is found to exist. There is
different version of this file called wintestinput3 for use on Windows,
where the locale is called "french" and the tests are run using
RunTest.bat. --/
< forbid 8W < forbid 8W
@ -90,7 +93,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
/\w/ISLfr_FR /\w/ISLfr_FR
@ -99,7 +102,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ

View file

@ -1263,4 +1263,12 @@ No match
aa aa
0: aa 0: aa
/^.\B.\B./8
\x{10123}\x{10124}\x{10125}
0: \x{10123}\x{10124}\x{10125}
/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/8
#\x{10000}#\x{100}#\x{10ffff}#
0: #\x{10000}#\x{100}#\x{10ffff}#
/-- End of testinput4 --/ /-- End of testinput4 --/

View file

@ -270,7 +270,7 @@ No match
/[z-\x{100}]/8DZ /[z-\x{100}]/8DZ
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[z-\x{100}] [z-\xff\x{100}]
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -812,7 +812,7 @@ No match
/[\H]/8BZ /[\H]/8BZ
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[\x00-\x08\x0a-\x1f!-\x9f\x{a1}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}] [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}]
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -820,7 +820,7 @@ No match
/[\V]/8BZ /[\V]/8BZ
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[\x00-\x09\x0e-\x84\x{86}-\x{2027}\x{202a}-\x{10ffff}] [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{10ffff}]
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -1536,7 +1536,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/[^\x{1234}]+?/iS8I /[^\x{1234}]+?/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -1544,7 +1544,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/[^\x{1234}]++/iS8I /[^\x{1234}]++/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -1552,7 +1552,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
No set of starting bytes No starting char list
/[^\x{1234}]{2}/iS8I /[^\x{1234}]{2}/iS8I
Capturing subpattern count = 0 Capturing subpattern count = 0
@ -1560,7 +1560,7 @@ Options: caseless utf
No first char No first char
No need char No need char
Subject length lower bound = 2 Subject length lower bound = 2
No set of starting bytes No starting char list
//<bsr_anycrlf><bsr_unicode> //<bsr_anycrlf><bsr_unicode>
Failed: inconsistent NEWLINE options at offset 0 Failed: inconsistent NEWLINE options at offset 0
@ -1620,7 +1620,7 @@ Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 7
/[\H\x{d7ff}]+/8BZ /[\H\x{d7ff}]+/8BZ
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[\x00-\x08\x0a-\x1f!-\x9f\x{a1}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}\x{d7ff}]++ [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}\x{d7ff}]++
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -1660,7 +1660,7 @@ Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 7
/[\V\x{d7ff}]+/8BZ /[\V\x{d7ff}]+/8BZ
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[\x00-\x09\x0e-\x84\x{86}-\x{2027}\x{202a}-\x{10ffff}\x{d7ff}]++ [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{10ffff}\x{d7ff}]++
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -1882,4 +1882,19 @@ Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 5
aa aa
0: aa 0: aa
/[b-d\x{200}-\x{250}]*[ae-h]?#[\x{200}-\x{250}]{0,8}[\x00-\xff]*#[\x{200}-\x{250}]+[a-z]/8BZ
------------------------------------------------------------------
Bra
[b-d\x{200}-\x{250}]*+
[ae-h]?+
#
[\x{200}-\x{250}]{0,8}+
[\x00-\xff]*
#
[\x{200}-\x{250}]++
[a-z]
Ket
End
------------------------------------------------------------------
/-- End of testinput5 --/ /-- End of testinput5 --/

View file

@ -719,9 +719,9 @@ No match
0: \x{6e9} 0: \x{6e9}
\x{060b} \x{060b}
0: \x{60b} 0: \x{60b}
\x{061c}
0: \x{61c}
** Failers ** Failers
No match
\x{061c}
No match No match
X\x{06e9} X\x{06e9}
No match No match
@ -2445,4 +2445,20 @@ No match
\x{37e} \x{37e}
No match No match
/[RST]+/8iW
Ss\x{17f}
0: Ss\x{17f}
/[R-T]+/8iW
Ss\x{17f}
0: Ss\x{17f}
/[q-u]+/8iW
Ss\x{17f}
0: Ss\x{17f}
/^s?c/mi8
scat
0: sc
/-- End of testinput6 --/ /-- End of testinput6 --/

View file

@ -124,7 +124,7 @@ No match
/[z-\x{100}]/8iDZ /[z-\x{100}]/8iDZ
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[Z\x{39c}\x{3bc}\x{1e9e}\x{178}z-\x{101}] [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}]
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -162,7 +162,7 @@ No match
/[z-\x{100}]/8DZi /[z-\x{100}]/8DZi
------------------------------------------------------------------ ------------------------------------------------------------------
Bra Bra
[Z\x{39c}\x{3bc}\x{1e9e}\x{178}z-\x{101}] [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}]
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
@ -2263,4 +2263,36 @@ No match
End End
------------------------------------------------------------------ ------------------------------------------------------------------
/[RST]+/8iWBZ
------------------------------------------------------------------
Bra
[R-Tr-t\x{17f}]++
Ket
End
------------------------------------------------------------------
/[R-T]+/8iWBZ
------------------------------------------------------------------
Bra
[R-Tr-t\x{17f}]++
Ket
End
------------------------------------------------------------------
/[Q-U]+/8iWBZ
------------------------------------------------------------------
Bra
[Q-Uq-u\x{17f}]++
Ket
End
------------------------------------------------------------------
/^s?c/mi8I
Capturing subpattern count = 0
Options: caseless multiline utf
First char at start or follows newline
Need char = 'c' (caseless)
scat
0: sc
/-- End of testinput7 --/ /-- End of testinput7 --/

View file

@ -7232,7 +7232,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 3 Subject length lower bound = 3
Starting byte set: a d x Starting chars: a d x
terhjk;abcdaadsfe terhjk;abcdaadsfe
0: abc 0: abc
the quick xyz brown fox the quick xyz brown fox
@ -7777,4 +7777,12 @@ Matched, but offsets vector is too small to show all matches
1: aaa 1: aaa
2: aa 2: aa
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
/-- End of testinput8 --/ /-- End of testinput8 --/

View file

@ -84,7 +84,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
/\w/ISLfrench /\w/ISLfrench
@ -93,7 +93,7 @@ No options
No first char No first char
No need char No need char
Subject length lower bound = 1 Subject length lower bound = 1
Starting byte set: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ƒ Š Œ Ž š œ ž Ÿ ª ² ³ µ ¹ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö ƒ Š Œ Ž š œ ž Ÿ ª ² ³ µ ¹ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö
Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý

View file

@ -192,7 +192,31 @@ enum {
ucp_Miao, ucp_Miao,
ucp_Sharada, ucp_Sharada,
ucp_Sora_Sompeng, ucp_Sora_Sompeng,
ucp_Takri ucp_Takri,
/* New for Unicode 7.0.0: */
ucp_Bassa_Vah,
ucp_Caucasian_Albanian,
ucp_Duployan,
ucp_Elbasan,
ucp_Grantha,
ucp_Khojki,
ucp_Khudawadi,
ucp_Linear_A,
ucp_Mahajani,
ucp_Manichaean,
ucp_Mende_Kikakui,
ucp_Modi,
ucp_Mro,
ucp_Nabataean,
ucp_Old_North_Arabian,
ucp_Old_Permic,
ucp_Pahawh_Hmong,
ucp_Palmyrene,
ucp_Psalter_Pahlavi,
ucp_Pau_Cin_Hau,
ucp_Siddham,
ucp_Tirhuta,
ucp_Warang_Citi
}; };
#endif #endif

View file

@ -68,11 +68,11 @@ function recurse($path)
// always include the config.h file // always include the config.h file
$content = file_get_contents($newfile); $content = file_get_contents($newfile);
$newcontent = preg_replace('/#\s*ifdef HAVE_CONFIG_H\s*(.+)\s*#\s*endif/', '$1', $content); //$newcontent = preg_replace('/#\s*ifdef HAVE_CONFIG_H\s*(.+)\s*#\s*endif/', '$1', $content);
if ($content !== $newcontent) { //if ($content !== $newcontent) {
file_put_contents($file, $newcontent); // file_put_contents($file, $newcontent);
} //}
echo "OK\n"; echo "OK\n";
} }