mirror of
https://github.com/php/php-src.git
synced 2025-08-16 05:58:45 +02:00
![]() The old implementation runs through the entire string to pick out the part which should be returned by mb_strcut. This creates significant performance overhead. The new specialized implementation of mb_strcut for UTF-8 usually only examines a few bytes around the starting and ending cut points, meaning it generally runs in constant time. For UTF-8 strings just a few bytes long, the new implementation is around 10% faster (according to microbenchmarks which I ran locally). For strings around 10,000 bytes in length, it is 50-300x faster. (Yes, that is 300x and not 300%.) The new implementation behaves identically to the old one on VALID UTF-8 strings; a fuzzer was used to help ensure this is the case. On invalid UTF-8 strings, there is a difference: in some cases, the old implementation will pass invalid byte sequences through unchanged, while in others it will remove them. The new implementation has behavior which is perhaps slightly more predictable: it simply backs up the starting and ending cut points to the preceding "starter byte" (one which is not a UTF-8 continuation byte). |
||
---|---|---|
.. | ||
libmbfl | ||
tests | ||
ucgendat | ||
common_codepoints.txt | ||
config.m4 | ||
config.w32 | ||
CREDITS | ||
gen_rare_cp_bitvec.php | ||
mb_gpc.c | ||
mb_gpc.h | ||
mbstring.c | ||
mbstring.h | ||
mbstring.stub.php | ||
mbstring_arginfo.h | ||
php_mbregex.c | ||
php_mbregex.h | ||
php_onig_compat.h | ||
php_unicode.c | ||
php_unicode.h | ||
rare_cp_bitvec.h | ||
unicode_data.h |