php-src

mirror of https://github.com/php/php-src.git synced 2025-08-16 05:58:45 +02:00

History

Alex Dowad 1f0cf133db Add fast mb_strcut implementation for UTF-8 The old implementation runs through the entire string to pick out the part which should be returned by mb_strcut. This creates significant performance overhead. The new specialized implementation of mb_strcut for UTF-8 usually only examines a few bytes around the starting and ending cut points, meaning it generally runs in constant time. For UTF-8 strings just a few bytes long, the new implementation is around 10% faster (according to microbenchmarks which I ran locally). For strings around 10,000 bytes in length, it is 50-300x faster. (Yes, that is 300x and not 300%.) The new implementation behaves identically to the old one on VALID UTF-8 strings; a fuzzer was used to help ensure this is the case. On invalid UTF-8 strings, there is a difference: in some cases, the old implementation will pass invalid byte sequences through unchanged, while in others it will remove them. The new implementation has behavior which is perhaps slightly more predictable: it simply backs up the starting and ending cut points to the preceding "starter byte" (one which is not a UTF-8 continuation byte).		2023-10-04 09:10:38 +02:00
..
libmbfl	Add fast mb_strcut implementation for UTF-8	2023-10-04 09:10:38 +02:00
tests	Add test cases for mb_strcut	2023-10-04 09:10:25 +02:00
ucgendat	Optimize mb_str{,im}width for performance	2021-09-29 18:19:01 +02:00
common_codepoints.txt	Improve mb_detect_encoding accuracy for text containing vowels with macrons	2023-08-25 12:09:55 +02:00
config.m4	Combine CJK encoding conversion code in a single source file	2023-05-20 21:27:48 -07:00
config.w32	Combine CJK encoding conversion code in a single source file	2023-05-20 21:27:48 -07:00
CREDITS
gen_rare_cp_bitvec.php	Mark globals as const (#10303 )	2023-01-23 13:46:58 +00:00
mb_gpc.c	Take order of candidate encodings into account when guessing text encoding	2023-05-16 07:01:07 -07:00
mb_gpc.h	Remove unused 'to_language' and 'from_language' struct fields	2022-08-16 16:43:26 +02:00
mbstring.c	Add fast mb_strcut implementation for UTF-8	2023-10-04 09:10:38 +02:00
mbstring.h	Take order of candidate encodings into account when guessing text encoding	2023-05-16 07:01:07 -07:00
mbstring.stub.php	[RFC] Implement mb_str_pad() (#11284 )	2023-06-20 21:22:04 +02:00
mbstring_arginfo.h	[RFC] Implement mb_str_pad() (#11284 )	2023-06-20 21:22:04 +02:00
php_mbregex.c	Reduce memory allocated by var_export, json_encode, serialize, and other (#8902 )	2022-07-08 14:47:46 +02:00
php_mbregex.h	Declare ext/mbstring constants in stubs (#8798 )	2022-06-23 17:34:08 +02:00
php_onig_compat.h
php_unicode.c	Implement conditional casing for Greek letter sigma when title-casing text	2023-01-12 17:41:11 +02:00
php_unicode.h	Speed boost for mb_stripos (when not using UTF-8)	2022-12-18 15:31:20 +02:00
rare_cp_bitvec.h	Improve mb_detect_encoding accuracy for text containing vowels with macrons	2023-08-25 12:09:55 +02:00
unicode_data.h	Update Unicode tables to 14.0.0	2021-09-20 09:58:20 +02:00