Commit graph

791 commits

Author SHA1 Message Date
Fabien Villepinte
2cc1cbf2f4 Fix Bug #75001: Wrong reflection on mb_eregi_replace 2017-08-02 18:08:42 +02:00
Nikita Popov
582a65b06f Implement full case mapping
Implement full case mapping according to SpecialCasing.txt and
also full case folding according to CaseFolding.txt (F). There
are a number of caveats:

* Only language-agnostic and unconditional full case mapping
  is implemented. The only language-agnostic conditional case
  mapping rule relates to Greek sigma in final position
  (Final_Sigma). Correctly handling this requires both arbitrary
  lookahead and lookbehind, which would require some larger
  changes to how the case mapping is implemented. This is a
  possible future extension.
* The only language-specific handling that is implemented is
  for Turkish dotted/undotted Is, if the ISO-8859-9 encoding
  is used. This matches the previous behavior and makes sure
  that no codepoints not supported by the encoding are
  produced. A future extension would be to also handle the
  Turkish mappings specified by SpecialCasing.txt based on
  the mbfl internal language.
* Full case folding is implemented, but case-insensitive mb_*
  operations continue to use simple case folding. The reason is
  that full case folding of the haystack string may change the
  position at which a match occurred. This would have to be
  mapped back into the position in the original string.
* mb_convert_case() exposes both the full and the simple case
  mapping / folding, where full is the default. The constants
  are:

   * MB_CASE_LOWER (used by mb_strtolower)
   * MB_CASE_UPPER (used by mb_strtolower)
   * MB_CASE_TITLE
   * MB_CASE_FOLD
   * MB_CASE_LOWER_SIMPLE
   * MB_CASE_UPPER_SIMPLE
   * MB_CASE_TITLE_SIMPLE
   * MB_CASE_FOLD_SIMPLE (used by case-insensitive operations)
2017-07-28 12:32:50 +02:00
Nikita Popov
9ac7c1e71d Use case-folding for case insensitive comparisons
Instead of using lowercasing.
2017-07-28 12:32:50 +02:00
Nikita Popov
f56b0afe6e Avoid some unnecessary mbfl_strlen() calculations 2017-07-28 12:32:50 +02:00
Anatol Belski
13a2629005 size_t fixes 2017-07-25 19:03:33 +02:00
Nikita Popov
445e13b149 Add MBFL_SUBSTR_TO_END mode to mbfl_substr
This takes the substr from the offset to the end of the string.
This avoids pointless searching for the end position and also
saves us a length calculation in the strstr family of functions.
2017-07-23 23:17:12 +02:00
Nikita Popov
bff11c382e Remove more obsolete length checks 2017-07-23 19:09:36 +02:00
Anatol Belski
78944bdfc6 remove cast 2017-07-23 17:38:28 +02:00
Anatol Belski
6809be2090 fix warnings and datatype
ident
2017-07-23 17:36:10 +02:00
Nikita Popov
b8ed74ce77 Merge branch 'PHP-7.2' 2017-07-23 11:55:46 +02:00
Nikita Popov
bd63c0f5b3 Fix bug #73528 2017-07-23 11:55:43 +02:00
Nikita Popov
80463579ce Remove confusing null checks in mb_send_mail
These are required parameters, they cannot be missing.
2017-07-23 11:55:43 +02:00
Nikita Popov
9af5b7f33d Fix use after free in mb_send_mail 2017-07-23 11:55:26 +02:00
Anatol Belski
4fbd7ccba2 touch yet more places for datatypes 2017-07-23 00:47:24 +02:00
Anatol Belski
61784bcb71 sync libmbfl allocator with the size_t changes 2017-07-22 23:53:00 +02:00
Anatol Belski
e0825ec60f Mitigation for ssize_t issue in 22a5f554a8
and some more
2017-07-22 22:34:16 +02:00
Nikita Popov
1388751f10 Use fast zpp in mb_strlen()
For short strings this function is now sufficiently fast for zpp
to be a bottleneck.
2017-07-20 21:41:52 +02:00
Nikita Popov
b3c1d9d111 Directly use encodings instead of no_encoding in libmbfl
In particular strings now store encoding rather than the
no_encoding.

I've also pruned out libmbfl APIs that existed in two forms, one
using no_encoding and the other using encoding. We were not actually
using any of the former.
2017-07-20 21:41:52 +02:00
Nikita Popov
77cb7bd837 Free last_used_encoding_name in RSHUTDOWN
efree() cannot be used in GSHUTDOWN
2017-07-20 18:12:04 +02:00
Nikita Popov
ba383b8239 Add basic mbstring encoding cache
Store the last used encoding and compare against it. It's quite
likely that an application is going to be using the same encoding
again and again.

The actual mbfl_name2encoding() function could also be optimized
to use a hash lookup rather than a linear scan, but we don't have
a hashtable implmentation in libmbfl...
2017-07-20 13:58:40 +02:00
Nikita Popov
264387e31e Add php_mb_get_no_encoding() helper function 2017-07-20 13:58:40 +02:00
Nikita Popov
adaea77593 Switch libmbfl to use size_t
Switch mbfl_string and related structures to use size_t lengths.

Quite likely that I broke some things along the way...
2017-07-20 13:58:40 +02:00
Nikita Popov
9c73be898d Directly accept encoding in php_unicode_convert_case()
As a side-effect mb_strtolower() and mb_strtoupper() now correctly
handle a NULL encoding parameter by using the internal encoding.
This is what caused the two test changes.
2017-07-19 23:59:42 +02:00
Nikita Popov
4128746b94 Add php_mb_get_encoding() convenience function 2017-07-19 23:59:42 +02:00
Nikita Popov
dead4f0b1b Avoid unnecessary encoding lookups in mbstring
Extract part of php_mb_convert_encoding that does the actual work
and use it whenever we already know the encoding.
2017-07-19 23:59:42 +02:00
Thomas Punt
9f08aff3fd Remove superfluous allocation checks around ZMM-based functions 2017-04-02 00:58:19 +02:00
Nikita Popov
edcabf6d07 Drop unnecessary allocator return value checks 2017-03-13 22:07:15 +01:00
Nikita Popov
a8239ff232 Deprecate mbstring.func_overload 2017-02-03 21:02:52 +01:00
Nikita Popov
2df9346e7f Deprecate mb_parse_str() without second argument 2017-02-03 18:52:57 +01:00
Sammy Kaye Powers
dac6c639bb Update copyright headers to 2017 2017-01-04 11:23:42 -06:00
Sammy Kaye Powers
478f119ab9 Update copyright headers to 2017 2017-01-04 11:14:55 -06:00
Joe Watkins
c8aa6f3a9a
Merge branch 'pull-request/2268'
* pull-request/2268:
  Update copyright headers to 2017
2017-01-04 10:00:53 +00:00
Joe Watkins
f9a435a06d
Merge branch 'pull-request/1094'
* pull-request/1094:
  added php_mb_check_code_point for mb_substitute_character
  news entry for PR #1094
2017-01-04 06:57:34 +00:00
Sammy Kaye Powers
9e29f841ce Update copyright headers to 2017 2017-01-02 09:30:12 -06:00
Dmitry Stogov
3e9bb03a62 Removed IS_TYPE_IMMUTABLE (it's the same as COPYABLE & !REFCOUED) 2016-11-28 22:59:57 +03:00
Anatol Belski
b204b3abd1 further normalizations, uint vs uint32_t
fix merge mistake

yet one more replacement run
2016-11-26 17:29:01 +01:00
Anatol Belski
bfb9be9bd4 Merge branch 'PHP-7.1'
* PHP-7.1:
  remove TSRMLS_*
2016-11-22 00:33:29 +01:00
Anatol Belski
d61db8d602 Merge branch 'PHP-7.0' into PHP-7.1
* PHP-7.0:
  remove TSRMLS_*
2016-11-22 00:32:42 +01:00
Anatol Belski
5e9b4c26a5 remove TSRMLS_* 2016-11-21 23:53:37 +01:00
Dmitry Stogov
222d22f3e1 Merge branch 'PHP-7.1'
* PHP-7.1:
  Prevent modification of immutable arrays (ext/mbstring/tests/bug26639.phpt failure with opcache.protect_memory=1)
2016-11-17 13:35:10 +03:00
Dmitry Stogov
a56bba14e0 Merge branch 'PHP-7.0' into PHP-7.1
* PHP-7.0:
  Prevent modification of immutable arrays (ext/mbstring/tests/bug26639.phpt failure with opcache.protect_memory=1)
2016-11-17 13:34:32 +03:00
Dmitry Stogov
a67637039f Prevent modification of immutable arrays (ext/mbstring/tests/bug26639.phpt failure with opcache.protect_memory=1) 2016-11-17 13:33:05 +03:00
Yasuo Ohgaki
7cb1be2ecd Use proper API 2016-10-16 07:29:33 +09:00
Yasuo Ohgaki
06b20d973a Fix test and cleanup code a little 2016-10-15 20:51:34 +09:00
Yasuo Ohgaki
4af00876f6 mb_check_encoding()/mb_convert_encoding() - Improve and add recursion detection. 2016-10-15 16:52:17 +09:00
Yasuo Ohgaki
6e530502d2 Implemented Bug #68776 mail() does not have mail header injection prevention for additional headers
(PR 2060)
2016-09-15 06:43:57 +09:00
Andrea Faulds
3cc9090101 Remove remaining zpp fallback code (master branch)
Follow-up to d690014bf3
2016-09-11 22:50:24 +01:00
Yasuo Ohgaki
8c26b0a6d2 Merge branch 'PHP-7.1'
* PHP-7.1:
  Fix Bug #72992 mbstring.internal_encoding doesn't inherit default_charset
2016-09-08 13:33:07 +09:00
Yasuo Ohgaki
1ecf361c15 Merge branch 'PHP-7.0' into PHP-7.1
* PHP-7.0:
  Fix Bug #72992 mbstring.internal_encoding doesn't inherit default_charset
2016-09-08 13:32:47 +09:00
Yasuo Ohgaki
379d9a1cfc Merge branch 'PHP-5.6' into PHP-7.0
* PHP-5.6:
  Fix Bug #72992 mbstring.internal_encoding doesn't inherit default_charset
2016-09-08 13:32:31 +09:00