php-src/ext/mbstring/tests/data
Alex Dowad a1a69c3734 Support Microsoft's "Best Fit" mappings for Windows-1252 text encoding
In b5ff87ca71, I made a number of adjustments to our conversion code
for CP1252. One of the adjustments was to make the mappings match those
published by the Unicode Consortium in the file CP1252.TXT. These do
not include mappings for the CP1252 bytes 0x81, 0x8D, 0x8F, 0x90, and
0x9D.

Rostyslav Gulka reported that this caused a problem. His application
stores binary JPEG data in an MS-SQL database. When they SELECT the
binary data out of the database, it is treated as CP1252 text and
automatically converted to UTF-8. To recover the original binary
data, they then do a conversion from UTF-8 to CP1252.

Obviously, that does not work if certain CP1252 bytes do not map to
any Unicode codepoint at all.

While this is a very unusual application of text encoding conversion,
and we might choose not to support it if there was no other basis for
including those mappings, it seems that Microsoft does actually include
them in the Win32 API as "best fit" mappings. These are extra mappings
from Unicode to other text encodings, which the Win32 API function
WideCharToMultiByte uses by default unless the WC_NO_BEST_FIT_CHARS
flag was passed.

A list of these "best fit" mappings for CP1252 can be found here:

https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
2022-12-09 15:18:37 +02:00
..
8859-1.txt
8859-2.txt
8859-3.txt
8859-4.txt
8859-5.txt
8859-6.txt
8859-7.txt
8859-8.txt
8859-9.txt
8859-10.txt
8859-11.txt
8859-13.txt
8859-14.txt
8859-15.txt
8859-16.txt
ARMSCII-8.txt Add test suite for ARMSCII-8 encoding 2020-11-02 21:31:06 +02:00
BIG5.txt Fix conversion of Big5 and CP950 text (and add test suite) 2021-07-19 12:17:00 +02:00
CP850.txt Add test suite for CP850 encoding 2020-11-02 21:31:06 +02:00
CP866.txt Add test suite for CP866 encoding 2020-11-02 21:31:06 +02:00
CP932.txt Enhance handling of CP932 text encoding 2020-11-25 19:52:19 +02:00
CP936.txt Fix conversion of CP936 text (and add test suite) 2021-06-29 12:25:21 +02:00
CP949.txt Strict conversion of UHC text to Unicode 2021-06-17 13:12:40 +02:00
CP950.txt Fix conversion of Big5 and CP950 text (and add test suite) 2021-07-19 12:17:00 +02:00
CP1251.txt Add test suite for CP1251 encoding 2020-11-02 21:31:05 +02:00
CP1252.txt Support Microsoft's "Best Fit" mappings for Windows-1252 text encoding 2022-12-09 15:18:37 +02:00
CP1254.txt Add test suite for CP1254 encoding 2020-11-02 21:31:05 +02:00
CP51932.txt Enhance handling of CP51932 encoding 2020-11-25 20:51:44 +02:00
EmojiSources.txt Fix mbstring support for SJIS-Mobile (DoCoMo, KDDI, and Softbank variants of Shift-JIS) 2020-11-25 20:51:44 +02:00
EUC-CN.txt Fix conversion of EUC-CN text (and add test suite) 2021-06-29 12:25:21 +02:00
EUC-JP-2004.txt Fix conversion of EUC-JP-2004 text (and add test suite) 2021-07-05 16:28:16 +02:00
EUC-JP-MS.IRREVERSIBLE.txt Add test suite for EUC-JP-WIN (or EUC-JP-MS) text encoding (and fix bugs) 2021-08-30 16:29:58 +02:00
EUC-JP-MS.txt Add test suite for EUC-JP-WIN (or EUC-JP-MS) text encoding (and fix bugs) 2021-08-30 16:29:58 +02:00
EUC-JP.txt Fix mbstring support for EUC-JP text encoding 2020-11-09 13:45:17 +02:00
EUC-KR.txt Fix conversion of EUC-KR text (and add test suite) 2021-06-29 12:25:21 +02:00
EUC-TW.txt Fix conversion of EUC-TW text (and add test suite) 2021-06-29 12:25:21 +02:00
GB2312.txt Fix conversion of HZ text (and add test suite) 2021-06-29 12:25:21 +02:00
GB18030-2byte.txt Fix conversion of GB18030 text (and add test suite) 2021-07-19 12:17:00 +02:00
ISO-2022-JP-2004-JISX0213.txt ISO-2022-JP-2004 conversion: handle invalid characters correctly 2021-01-14 22:26:24 +02:00
JISX0201.txt JIS7/JIS8 encoding: handle invalid 2nd byte for Kanji correctly 2021-01-14 22:31:31 +02:00
JISX0208.txt ISO-2022-JP-2004 conversion: handle invalid characters correctly 2021-01-14 22:26:24 +02:00
JISX0212.txt JIS7/JIS8 encoding: handle invalid 2nd byte for Kanji correctly 2021-01-14 22:31:31 +02:00
KOI8-R.txt Add test suite for KOI8-R encoding 2020-11-02 21:31:06 +02:00
KOI8-U.txt Add test suite for KOI8-U encoding 2020-11-02 21:31:06 +02:00
KSX1001.txt Fix conversion of ISO-2022-KR text (and add test suite) 2021-07-05 16:28:16 +02:00
MacJapanese-SJIS.txt Add test suite for SJIS-mac encoding 2020-11-11 11:18:58 +02:00
SHIFTJIS.txt Fix mbstring support for Shift-JIS 2020-11-09 13:45:16 +02:00
SJIS-2004.txt Add test suite for SJIS-2004 encoding 2020-11-11 11:18:58 +02:00
UTF-8-DOCOMO.txt Add test suite for mobile variants of UTF-8 (and fix bugs) 2021-08-30 16:29:58 +02:00
UTF-8-KDDI-A.txt Add test suite for mobile variants of UTF-8 (and fix bugs) 2021-08-30 16:29:58 +02:00
UTF-8-KDDI-B.txt Add test suite for mobile variants of UTF-8 (and fix bugs) 2021-08-30 16:29:58 +02:00
UTF-8-SOFTBANK.txt Add test suite for mobile variants of UTF-8 (and fix bugs) 2021-08-30 16:29:58 +02:00