php-src/ext/mbstring
Alex Dowad 18e526cb51 Fix legacy text conversion filter for SJIS-2004
EUC-JP-2004 includes special byte sequences starting with 0x8E
for kana. The legacy output routine for EUC-JP-2004 emits
these sequences if the value of the output variable `s` is
between 0x80 and 0xFF.

Since the same routine was also used for SJIS-2004 and
ISO-2022-JP-2004, before 8a915ed26c, the same 0x8E sequences
would be emitted when converting to those text encodings as well.
But that is completely wrong. 0x8E 0x__ does not mean the same
in SJIS-2004 or ISO-2022-JP-2004 as it does in EUC-JP-2004.

Therefore, in 8a915ed26c, I fixed the legacy conversion routine
by checking whether the output encoding is EUC-JP-2004 or not.
If it's not, and `s` is 0x80-0xFF, I made it emit an error.

Well, it turns out that single bytes with values from 0xA1
to 0xDF are meaningful in SJIS-2004. To emit these bytes when
appropriate, I had to amend the legacy conversion routine again.

(For clarity, this does NOT mean reverting to the behavior prior
to 8a915ed26c. We were right not to emit sequences starting with
0x8E in SJIS-2004. But in SJIS-2004, we *do* sometimes need to
emit single bytes from 0xA1-0xDF.)
2022-08-16 16:43:27 +02:00
..
libmbfl Fix legacy text conversion filter for SJIS-2004 2022-08-16 16:43:27 +02:00
tests Adjust number of error markers emitted for truncated ISO-2022-JP escape sequence 2022-08-16 16:43:27 +02:00
ucgendat Optimize mb_str{,im}width for performance 2021-09-29 18:19:01 +02:00
common_codepoints.txt mb_detect_encoding recognizes all letters in Hungarian alphabet 2022-05-25 08:22:07 +02:00
config.m4 New implementation of mb_convert_kana 2022-07-20 07:44:19 +02:00
config.w32 New implementation of mb_convert_kana 2022-07-20 07:44:19 +02:00
CREDITS
gen_rare_cp_bitvec.php Improve detection accuracy of mb_detect_encoding 2021-10-19 18:05:51 +02:00
mb_gpc.c Remove unused 'to_language' and 'from_language' struct fields 2022-08-16 16:43:26 +02:00
mb_gpc.h Remove unused 'to_language' and 'from_language' struct fields 2022-08-16 16:43:26 +02:00
mbstring.c Remove unused 'to_language' and 'from_language' struct fields 2022-08-16 16:43:26 +02:00
mbstring.h php_mb_convert_encoding{,_ex} returns zend_string 2022-05-28 21:53:39 +02:00
mbstring.stub.php Fix mb_strimwidth RC info 2022-08-05 17:06:23 +02:00
mbstring_arginfo.h Fix mb_strimwidth RC info 2022-08-05 17:06:23 +02:00
php_mbregex.c Reduce memory allocated by var_export, json_encode, serialize, and other (#8902) 2022-07-08 14:47:46 +02:00
php_mbregex.h Declare ext/mbstring constants in stubs (#8798) 2022-06-23 17:34:08 +02:00
php_onig_compat.h
php_unicode.c Optimize mbstring upper/lowercasing: use fast path in more cases 2021-09-20 11:27:54 +02:00
php_unicode.h Add comments to grouped character properties 2021-08-24 22:09:26 +02:00
rare_cp_bitvec.h mb_detect_encoding recognizes all letters in Hungarian alphabet 2022-05-25 08:22:07 +02:00
unicode_data.h Update Unicode tables to 14.0.0 2021-09-20 09:58:20 +02:00