Because the default characters are defined in the stub file, and the
stub file is UTF-8 (typically), the characters are encoded in the string
as UTF-8. When using a different character encoding, there is a mismatch
between what mb_trim expects and the UTF-8 encoded string it gets.
One way of solving this is by making the characters argument nullable,
which would mean that it always uses the internal code path that has the
unicode codepoints that are defaulted actually stored as codepoint
numbers instead of in a string.
Co-authored-by: @ranvis
This has been the case at least since PHP 5.4. Thanks to Girgias for
pointing it out.
It appears that there are several global variables internal to mbstring
which can be queried via mb_get_info() and which could be NULL, but
at the very least, we know that "mbstring.http_input" is one of them.
This will allow us to easily check in other mbstring functions if the
list of all supported encodings, returned by mb_list_encodings, is
passed in as input to another function.
Co-authored-by: Ilija Tovilo <ilija.tovilo@me.com>
@cname currently refers to the constant name in C. However, it is not always a (constant) name, but sometimes a function invocation, so naming it as @cvalue would be more appropriate.
mb_ereg()/mb_eregi() currently have an inconsistent return value
based on whether the $matches parameter is passed or not:
> Returns the byte length of the matched string if a match for
> pattern was found in string, or FALSE if no matches were found
> or an error occurred.
>
> If the optional parameter regs was not passed or the length of
> the matched string is 0, this function returns 1.
Coupling this behavior to the $matches parameter doesn't make sense
-- we know the match length either way, there is no technical
reason to distinguish them. However, returning the match length
is not particularly useful either, especially due to the need to
convert 0-length into 1-length to satisfy "truthy" checks. We
could always return 1, which would kind of match the behavior of
preg_match() -- however, preg_match() actually returns the number
of matches, which is 0 or 1 for preg_match(), while false signals
an error. However, mb_ereg() returns false both for no match and
for an error. This would result in an odd 1|false return value.
The patch canonicalizes mb_ereg() to always return a boolean,
where true indicates a match and false indicates no match or error.
This also matches the behavior of the mb_ereg_match() and
mb_ereg_search() functions.
This fixes the default value integrity violation in PHP 8.
Closes GH-6331.
Add assertions to check the return value is not NULL as this indicates a bug.
Add identical assertion to mb_strtoupper and mb_strtolower.
This means these functions can't return false anymore, ammend stubs accordingly.
Promote empty needle warning to ValueError
Convert if branch into an assertion as if mbfl_substr_count fails this now implies a bug
Thus mb_substr_count() can only return int now, fix stubs accordingly
Also add a TODO about documenting this funcion on PHP.net
Convert some checks to assertions as if they don't hold something went wrong during memory allocation
Due to these changes this function cannot return false anymore, fix stubs accordingly
`mb_encode_numericentity()` and `mb_decode_numericentity()` accepted
arbitrary zvals as `$convmap`, but ignored anything else than arrays.
This appears to be an unresolved relict of their ZPP conversion for
PHP 5.3[1]. We now expect an array in the first place.
We also expect `count($convmap)` to be a multiple of four (else we
throw a `ValueError`), and do no longer special case empty `$convmap`.
[1] <http://git.php.net/?p=php-src.git;a=commit;h=1c77f594294aee9d60e7309279c616c01c39ba9d>