Commit graph

377 commits

Author SHA1 Message Date
Colin O'Dell
201930106d Add test for negative lengths in mb_strcut() 2017-11-22 22:47:55 +01:00
Colin O'Dell
830d87b86e Add tests for mb_language() 2017-11-22 22:47:55 +01:00
Dmitry Stogov
cb9d81ef4f Refactored recursion pretection 2017-10-06 01:34:50 +03:00
Nikita Popov
840b77c02e Merge branch 'PHP-7.2' 2017-08-04 22:20:11 +02:00
Nikita Popov
6b73b2d6eb Check for empty string in mb_ord() 2017-08-04 22:20:05 +02:00
Nikita Popov
4e4ec31e2e Merge branch 'PHP-7.2' 2017-08-04 13:02:44 +02:00
Nikita Popov
353f7bf461 Also check for invalid codepoints in mb_ord()
And return false in that case, instead of returning 0x3f...
2017-08-04 13:01:03 +02:00
Nikita Popov
5caf05f6c5 Merge branch 'PHP-7.2' 2017-08-03 22:41:15 +02:00
Nikita Popov
e53162a32b Return false on invalid codepoint in mb_chr()
Instead of returning the encoding of the current substitution
character. This allows a robust check for the failure case. The
substitution character (especially the default of "?") is also
a valid output of mb_chr() for a valid input (for "?" that would be
0x3f), so it's a bad choice for an error value.
2017-08-03 22:36:42 +02:00
Nikita Popov
41e9ba6333 Always use Unicode codepoints in mb_ord() and mb_chr()
Previously mb_chr() had two different encoding-dependent behaviors:
 * For "Unicode-encodings" it took a Unicode codepoint and returned
   its encoded representation.
 * Otherwise it returned a big-endian binary encoding of the passed
   integer.

Now the input is always interpreted as a Unicode codepoint. If
a big-endian binary encoding is what you want, you don't need
mbstring to implement that.
2017-08-03 22:14:00 +02:00
Nikita Popov
c98714f19e Merge branch 'PHP-7.2' 2017-08-03 21:57:35 +02:00
Nikita Popov
fb9bf5b64b Revert/fix substitution character fallback
The introduced checks were not correct in two respects:
 * It was checked whether the source encoding of the string matches
   the internal encoding, while the actually relevant encoding is
   the *target* encoding.
 * Even if the correct encoding is used, the checks are still too
   conservative. Just because something is not a "Unicode-encoding"
   does not mean that it does not map any non-ASCII characters.

I've reverted the added checks and instead adjusted mbfl_convert
to first try to use the provided substitution character and if
that fails, perform the fallback to '?' at that point. This means
that any codepoint mapped in the target encoding should now be
correctly supported and anything else should fall back to '?'.
2017-08-03 21:53:59 +02:00
Nikita Popov
3d948d77d1 Merge branch 'PHP-7.2' 2017-08-03 21:17:26 +02:00
Nikita Popov
a8a9e93e9a Revert/fix mb_substitute_character() codepoint checks
The introduced checks did not treat "non-Unicode" encodings correctly,
because they treated the passed integer as encoded in the internal
encoding in that case, while in actuality the substitute character
is always a Unicode codepoint.

Additionally checking the codepoint against the internal encoding
is not correct in any case, because the substitution character must
be mapped in the *target* encoding of the conversion, which does
not necessarily coincide with the internal encoding (the internal
encoding is the default *source* encoding, not *target* encoding).

This reverts the checks back to simple range checks, but in a way
that still resolves #69079: Characters outside the Basic
Multilingual Plane are now accepted and Surrogate Codepoints are
rejected. A distinction between UTF-8 and non-UTF-8 encodings is
not made for surrogate checks (as in the original patch), as
surrogates are always illegal on their own. Specifying a surrogate
as substitution character would only make sense if you could
specify a substitution string with more than one character --
however we do not support that.
2017-08-03 21:12:41 +02:00
Nikita Popov
f4a1d9c821 Fixed bug #65544 and #71298 2017-07-28 14:57:08 +02:00
Nikita Popov
25b6e68432 Merge branch 'PHP-7.2' 2017-07-28 13:03:35 +02:00
Nikita Popov
5d777e56e2 Merge branch 'PHP-7.1' into PHP-7.2 2017-07-28 13:03:26 +02:00
Nikita Popov
c48c638aeb Merge branch 'PHP-7.0' into PHP-7.1 2017-07-28 13:03:02 +02:00
Nikita Popov
e3d25e78eb Fixed bug #62934 2017-07-28 13:02:25 +02:00
Nikita Popov
582a65b06f Implement full case mapping
Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:

* Only language-agnostic and unconditional full case mapping
  is implemented. The only language-agnostic conditional case
  mapping rule relates to Greek sigma in final position
  (Final_Sigma). Correctly handling this requires both arbitrary
  lookahead and lookbehind, which would require some larger
  changes to how the case mapping is implemented. This is a
  possible future extension.
* The only language-specific handling that is implemented is
  for Turkish dotted/undotted Is, if the ISO-8859-9 encoding
  is used. This matches the previous behavior and makes sure
  that no codepoints not supported by the encoding are
  produced. A future extension would be to also handle the
  Turkish mappings specified by SpecialCasing.txt based on
  the mbfl internal language.
* Full case folding is implemented, but case-insensitive mb_*
  operations continue to use simple case folding. The reason is
  that full case folding of the haystack string may change the
  position at which a match occurred. This would have to be
  mapped back into the position in the original string.
* mb_convert_case() exposes both the full and the simple case
  mapping / folding, where full is the default. The constants
  are:

   * MB_CASE_LOWER (used by mb_strtolower)
   * MB_CASE_UPPER (used by mb_strtolower)
   * MB_CASE_TITLE
   * MB_CASE_FOLD
   * MB_CASE_LOWER_SIMPLE
   * MB_CASE_UPPER_SIMPLE
   * MB_CASE_TITLE_SIMPLE
   * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
2017-07-28 12:32:50 +02:00
Nikita Popov
9ac7c1e71d Use case-folding for case insensitive comparisons
Instead of using lowercasing.
2017-07-28 12:32:50 +02:00
Nikita Popov
7077c719db Merge branch 'PHP-7.2' 2017-07-23 15:36:25 +02:00
Nikita Popov
077e61fad3 Fixed bug #69267 completely
ucgendat.c was assuming that a title-case character is a character
that has both lower and upper-case variants. However, there are
title-case characters that only have a lower-case variant. Use the
Lt general character proprety to determine where in the case map
the character should be placed instead.
2017-07-23 15:30:17 +02:00
Nikita Popov
c0bcd301d3 Another fix for bug #69267
mb_strtoupper() was converting lowercase characters into
titlecase characters, instead of uppercase characters. Luckily
there are only very few characters with a distinct titlecase
representation, so this mostly worked out okay...
2017-07-23 15:07:02 +02:00
Nikita Popov
0e4af9192f Partial fix for bug #69267
This pulls in 60a25c72ba389f53b0621ca250bc99f3b295d43f from the
OpenLDAP project.
2017-07-23 14:47:21 +02:00
Nikita Popov
698132d6f9 Merge branch 'PHP-7.2' 2017-07-23 12:22:09 +02:00
Nikita Popov
88f752a947 Merge branch 'PHP-7.1' into PHP-7.2 2017-07-23 12:21:51 +02:00
Nikita Popov
f116a88592 Merge branch 'PHP-7.0' into PHP-7.1 2017-07-23 12:21:16 +02:00
Christoph M. Becker
418da85f15 Fix #71606: Segmentation fault mb_strcut with HTML-ENTITIES
The HTML decoding filter uses the `opaque` member of mbfl_convert_filter
as buffer, but there was no copy constructor defined, what caused double
frees when the filter is copied (what happens multiple times in mb_strcut(),
for instance).
2017-07-23 12:19:27 +02:00
Nikita Popov
9c73be898d Directly accept encoding in php_unicode_convert_case()
As a side-effect mb_strtolower() and mb_strtoupper() now correctly
handle a NULL encoding parameter by using the internal encoding.
This is what caused the two test changes.
2017-07-19 23:59:42 +02:00
Nikita Popov
a8239ff232 Deprecate mbstring.func_overload 2017-02-03 21:02:52 +01:00
Nikita Popov
2df9346e7f Deprecate mb_parse_str() without second argument 2017-02-03 18:52:57 +01:00
Joe Watkins
f9a435a06d
Merge branch 'pull-request/1094'
* pull-request/1094:
  added php_mb_check_code_point for mb_substitute_character
  news entry for PR #1094
2017-01-04 06:57:34 +00:00
Xinchen Hui
bc6b17148b Merge branch 'PHP-7.1'
* PHP-7.1:
  Fixed bug #73646 (mb_ereg_search_init null pointer dereference)
2016-12-09 15:56:41 +08:00
Xinchen Hui
6a43c61bcd Fixed bug #73646 (mb_ereg_search_init null pointer dereference) 2016-12-09 15:55:07 +08:00
Anatol Belski
6a34065bf0 fix tests 2016-11-25 23:04:15 +01:00
Anatol Belski
2a76d2282a upgrade to Oniguruma 6.1.2 2016-11-25 22:00:53 +01:00
Nikita Popov
5af586bec5 Remove more PHP 6 leftovers from tests 2016-11-24 22:39:39 +01:00
Nikita Popov
45f7b2bcc8 Fix CRLF line-endings in tests
Also fix a single instance of CRLF in ibase_query.c.
2016-11-20 22:31:24 +01:00
Pedro Magalhães
9c5af4e4cb Remove the b prefix from literals on unrelated tests 2016-11-20 21:11:53 +01:00
Xinchen Hui
3a8a99e662 Merge branch 'PHP-7.1'
* PHP-7.1:
  Fixed bug #73532 (Null pointer dereference in mb_eregi)
2016-11-16 15:05:22 +08:00
Xinchen Hui
229024c725 Fixed bug #73532 (Null pointer dereference in mb_eregi) 2016-11-16 15:05:04 +08:00
Yasuo Ohgaki
2bd34885da Add tests 2016-10-15 21:03:14 +09:00
Yasuo Ohgaki
06b20d973a Fix test and cleanup code a little 2016-10-15 20:51:34 +09:00
Yasuo Ohgaki
4af00876f6 mb_check_encoding()/mb_convert_encoding() - Improve and add recursion detection. 2016-10-15 16:52:17 +09:00
Christoph M. Becker
fcc6f2df59 Merge branch 'PHP-7.1' 2016-09-06 14:15:05 +02:00
Christoph M. Becker
68d3501381 Merge branch 'pull-request/2115' into PHP-7.1 2016-09-06 14:14:23 +02:00
Yasuo Ohgaki
96e59a200e Merge branch 'PHP-7.1'
* PHP-7.1:
  Fixed Bug #66964 mb_convert_variables() cannot detect recursion
2016-09-06 18:22:04 +09:00
Yasuo Ohgaki
2605ceeaca Added array parameter support to mb_convert_encoding() 2016-09-06 18:20:24 +09:00
Yasuo Ohgaki
012232b9a4 Merge branch 'PHP-7.0' into PHP-7.1
* PHP-7.0:
  Fixed Bug #66964 mb_convert_variables() cannot detect recursion
2016-09-06 16:42:07 +09:00