PCRE only validates the string starting from the start offset
(minus maximum look-behind, but let's ignore that), so we can
only remember that the string is fully valid UTF-8 is the original
start offset is zero.
We need not just the whole string to be UTF-8, but the start
position to be on a character boundary as well. Check this by
looking for a continuation byte.
This helps to avoid unnecessary IS_REFERENCE checks.
This changes some notices "Only variables should be passed by reference" to exception "Cannot pass parameter %d by reference".
Also, for consistency, compile-time fatal error "Only variables can be passed by reference" was converted to exception "Cannot pass parameter %d by reference"
When compiling with "-Wall -Werror" gcc emits two errors:
../src/pcre2_jit_neon_inc.h:211:8: error: ‘cmp1b’ is used uninitialized in this function [-Werror=uninitialized]
211 | data = fast_forward_char_pair_compare(compare1_type, data, cmp1a, cmp1b);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/pcre2_jit_neon_inc.h:212:9: error: ‘cmp2b’ is used uninitialized in this function [-Werror=uninitialized]
212 | data2 = fast_forward_char_pair_compare(compare2_type, data2, cmp2a, cmp2b);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The compiler emits the error message before inlining.
Because the warning is based on an intra-procedural data
flow analysis, the compiler does not see that cmp1b and
cmp2b are not used when they are not initialized.
Here is the code of that function, cmp2 is only used
when ctype is compare_match2 or compare_match1i,
and not when ctype is compare_match1:
static inline vect_t fast_forward_char_pair_compare(compare_type ctype, vect_t dst, vect_t cmp1, vect_t cmp2)
{
if (ctype == compare_match2)
{
vect_t tmp = dst;
dst = VCEQQ(dst, cmp1);
tmp = VCEQQ(tmp, cmp2);
dst = VORRQ(dst, tmp);
return dst;
}
if (ctype == compare_match1i)
dst = VORRQ(dst, cmp2);
dst = VCEQQ(dst, cmp1);
return dst;
}
The patch inlines by hand the case of compare_match1 such that the
code avoids referring to cmp1b and cmp2b.
Tested on aarch64-linux with `make check`.
A new function `pcre_get_compiled_regex_cache_ex()` is introduced,
which allows to compile regexp pattern using the "C" locale instead
of a current locale.
This will be needed to replace setlocale() usage in fileinfo,
which is not thread-safe.
Print a more informative message that indicates that this is
likely a permission issue, and also indicate that pcre.jit=0
can be used to work around it.
Also automatically disable the JIT, so that this message is
only shown once.
See bug #78630.
This is intended to fix the primary issue from bug #77260.
Prior to macOS 10.14 multiple MAP_JIT segments were not permitted,
leading to mmap failures and corresponding "no more memory" errors
on macOS 10.13.