php-src

mirror of https://github.com/php/php-src.git synced 2025-08-18 06:58:55 +02:00

Author	SHA1	Message	Date
George Peter Banyard	90d41cccfd	ext/mbstring: move another test case that only works on 64 bits	2023-12-08 17:17:28 +00:00
Gina Peter Banyard	7684a3d138	ext/mbstring: move unsigned 32 bit integer tests to a new test (#12891 ) And only run it on 64 bit architectures as those are floats on 32 bit.	2023-12-07 20:19:11 +00:00
Gina Peter Banyard	e74bf42c81	ext/mbstring: Check conversion map only has integers	2023-12-06 23:47:00 +00:00
Alex Dowad	76a92c26e3	mb_decode_numericentity decodes valid entities which are truncated at end of string Since mb_decode_numericentity does not require all HTML entities to end with ';', but allows them to be terminated by ANY non-digit character, it doesn't make sense that valid entities which butt up against the end of the input string are not converted. As it turned out, supporting this case also made it possible to simplify the code nicely.	2022-07-18 15:11:47 +02:00
Alex Dowad	5d6bd557b3	mb_decode_numericentity converts entities which immediately follow a valid/invalid entity Thanks to Kamil Tieleka for suggesting that some of the behaviors of the legacy implementation which the new mb_decode_numericentity implementation took care to maintain were actually bugs and should be fixed. Thanks also to Trevor Rowbotham for providing a link to the HTML specification, showing how HTML numeric entities should be interpreted. mb_decode_numericentity now processes numeric entities in the following situations where the old implementation would not: - &<ENTITY> (for example, &A) - &#<ENTITY> - &#x<ENTITY> - <VALID BUT UNTERMINATED DECIMAL ENTITY><ENTITY> (for example, &#65A) - <VALID BUT UNTERMINATED HEX ENTITY><ENTITY> - <INVALID AND UNTERMINATED DECIMAL ENTITY><ENTITY> (it does not matter why the first entity is invalid; the value could be too big, it could have too many digits, or it could not match the 'convmap' parameter) - <INVALID AND UNTERMINATED HEX ENTITY><ENTITY> This is consistent with the way that web browsers process HTML entities.	2022-07-18 15:11:32 +02:00
Alex Dowad	91969e908f	New implementation of mb_{de,en}code_numericentity This new implementation uses the new encoding conversion filters. Aside from fewer LOC and (hopefully) improved readability, the differences are as follows: BEHAVIOR CHANGES: - The old implementation used signed arithmetic when operating on the 'convmap'. This meant that results could be surprising when using convmap entries with 1 in the MSB. Further, types like 'int' were used rather than those with a specific bit width, such as 'int32_t'. This meant that results could also depend on the platform width of an 'int'. Now unsigned arithmetic is used, with explicit bit widths. - Similarly, while converting decimal numeric entities, the legacy implementation would ensure that the value never overflowed INT_MAX, and if it did, the entity would be treated as invalid and passed through unconverted. However, that again means that results depend on the platform size of an 'int'. So now, we use a value with explicit bit width (32 bits) to hold the value of a deconverted decimal entity, and ensure that the entity value does not overflow that. Further, because we are using an UNSIGNED 32-bit value rather than a signed one, the ceiling for how large a decimal entity can be is higher now. All of this will probably not affect anyone, since Unicode codepoints above U+10FFFF are invalid anyways. To see the difference, you need to be using a text encoding like UCS-4, which allows huge 'codepoints'. - If it saw something which looked like a hex entity, but turned out not to be a valid numeric entity, the old implementation would sometimes convert the hexadecimal digits a-f to A-F (uppercase). The new implementation passes invalid numeric entities through without performing case conversion. - The old implementation of mb_encode_numericentity was limited in how many decimal/hex digits it could emit. If a text encoding like UCS-4 was in use, where 'codepoints' can have huge values (larger than the valid range stipulated by the Unicode standard), it would not error out on a 'codepoint' whose value was too large for it, but would rather mangle the value and emit a numeric entity which decoded to some other random codepoint. The new implementation is able to emit enough digits to express any value which fits in 32 bits. PERFORMANCE: Based on micro-benchmarks run on my development machine: Decoding numeric HTML entities is about 4 times faster, for both decimal and hexadecimal entities, across a variety of input string lengths. Encoding is about 3 times faster.	2022-07-18 15:11:30 +02:00
Alex Dowad	57eafd44c6	Add more tests for mb_decode_numericentity	2021-09-20 11:27:54 +02:00
Nikita Popov	a06d015e61	Remove unnecessary mbstring skipifs These functions are always available (if the extension is available at all).	2021-06-14 15:27:28 +02:00
Nikita Popov	7485978339	Migrate SKIPIF -> EXTENSIONS (#7138 ) This is an automated migration of most SKIPIF extension_loaded checks.	2021-06-11 11:57:42 +02:00
Nikita Popov	cafceea742	Update mbstring parameter names Closes GH-6207.	2020-09-28 09:51:58 +02:00
Alex Dowad	dc98c1346d	Additional tests for mbstring extension	2020-08-31 23:15:57 +02:00
Máté Kocsis	6111d64cda	Improve a last couple of argument error messages Closes GH-5404	2020-04-20 13:09:00 +02:00
Nikita Popov	7d170eb295	Merge branch 'PHP-7.4' * PHP-7.4: Fix shift ub in mbstring Restore digit check in mb_decode_numericentity()	2020-01-30 10:08:21 +01:00
Nikita Popov	9aadcb18e1	Restore digit check in mb_decode_numericentity() I replaced it with a multiplication overflow check in `18599f9c52`. However, we need both, because the code for restoring the number can't handle numbers with many leading zeros right now and I don't feel like teaching it.	2020-01-30 10:07:01 +01:00
Nikita Popov	b2c8abe951	Merge branch 'PHP-7.4' * PHP-7.4: Better overflow check for entity decoding	2020-01-29 16:08:55 +01:00
Nikita Popov	18599f9c52	Better overflow check for entity decoding Check for multiplication overflow rather than number of digits.	2020-01-29 16:08:46 +01:00
Nikita Popov	bc32cce6a2	Merge branch 'PHP-7.4' * PHP-7.4: Fix recovery of large entities in mb_decode_numericentity()	2020-01-29 11:49:27 +01:00
Nikita Popov	91f878779c	Fix recovery of large entities in mb_decode_numericentity() Make sure we don't overflow the integer.	2020-01-29 11:48:34 +01:00
Christoph M. Becker	e2100619ac	Expect appropriate parameter type in the first place `mb_encode_numericentity()` and `mb_decode_numericentity()` accepted arbitrary zvals as `$convmap`, but ignored anything else than arrays. This appears to be an unresolved relict of their ZPP conversion for PHP 5.3[1]. We now expect an array in the first place. We also expect `count($convmap)` to be a multiple of four (else we throw a `ValueError`), and do no longer special case empty `$convmap`. [1] <http://git.php.net/?p=php-src.git;a=commit;h=1c77f594294aee9d60e7309279c616c01c39ba9d>	2019-10-07 16:48:08 +02:00
Felipe Pena	b79740b458	- New tests (WurzbrugUG testfest)	2009-07-07 01:15:12 +00:00

20 commits