php-src/ext/mbstring
Alex Dowad 4427b2e1ab Mark UTF-8 strings emitted by mbstring functions as valid UTF-8
We now have a couple of mbstring functions which have fast paths for
strings marked as 'valid UTF-8'. Later, we may likely have more. So
that these fast paths can be used more frequently, mark UTF-8 strings
emitted by mbstring as 'valid UTF-8'. This is always a correct thing
to do, because mbstring never returns invalid UTF-8 as the result of
a conversion (or similar) operation.

Internally, we do have a conversion mode which deliberately emits
invalid UTF-8 in some cases. (This is done to prevent unwanted matches
when we are converting strings to UTF-8 before performing matching
operations on them.) For such strings, don't set the 'valid UTF-8' flag.
It probably wouldn't hurt anything to set it, because strings generated
using that special conversion mode should *never* be returned to
userland, and I don't think we do anything with them which cares about
the IS_STR_VALID_UTF8 flag... but still, it would likely cause
confusion for developers.
2023-01-11 17:08:27 +02:00
..
libmbfl Mark UTF-8 strings emitted by mbstring functions as valid UTF-8 2023-01-11 17:08:27 +02:00
tests Add fast SSE2-based implementation of mb_strlen for known-valid UTF-8 strings 2023-01-09 07:50:40 +02:00
ucgendat Optimize mb_str{,im}width for performance 2021-09-29 18:19:01 +02:00
common_codepoints.txt Improve mb_detect_encoding's recognition of Turkish text 2022-12-30 14:22:46 +02:00
config.m4 Move mobile variants of SJIS into mbfilter_sjis.c 2022-12-12 16:28:49 +02:00
config.w32 Move mobile variants of SJIS into mbfilter_sjis.c 2022-12-12 16:28:49 +02:00
CREDITS
gen_rare_cp_bitvec.php Improve detection accuracy of mb_detect_encoding 2021-10-19 18:05:51 +02:00
mb_gpc.c Remove unused 'to_language' and 'from_language' struct fields 2022-08-16 16:43:26 +02:00
mb_gpc.h Remove unused 'to_language' and 'from_language' struct fields 2022-08-16 16:43:26 +02:00
mbstring.c Mark UTF-8 strings emitted by mbstring functions as valid UTF-8 2023-01-11 17:08:27 +02:00
mbstring.h Implement mb_output_handler using fast text conversion filters 2023-01-03 09:02:21 +02:00
mbstring.stub.php Fix mb_strimwidth RC info 2022-08-05 17:06:23 +02:00
mbstring_arginfo.h Do not generate CONST_CS when registering constants (#9439) 2022-08-28 08:27:19 +02:00
php_mbregex.c Reduce memory allocated by var_export, json_encode, serialize, and other (#8902) 2022-07-08 14:47:46 +02:00
php_mbregex.h Declare ext/mbstring constants in stubs (#8798) 2022-06-23 17:34:08 +02:00
php_onig_compat.h
php_unicode.c Mark UTF-8 strings emitted by mbstring functions as valid UTF-8 2023-01-11 17:08:27 +02:00
php_unicode.h Speed boost for mb_stripos (when not using UTF-8) 2022-12-18 15:31:20 +02:00
rare_cp_bitvec.h Improve mb_detect_encoding's recognition of Turkish text 2022-12-30 14:22:46 +02:00
unicode_data.h Update Unicode tables to 14.0.0 2021-09-20 09:58:20 +02:00