php-src/ext/mbstring
Alex Dowad 67051eb8ed Fix segfault caused by use of 'pass' encoding when mbstring converts multipart form POST data
When mbstring.encoding_translation=1, and PHP receives an (RFC1867)
form-based file upload, and the Content-Disposition HTTP header contains
a filename for the uploaded file, PHP will internally invoke mbstring
code to 1) try to auto-detect the text encoding of the filename, and if
that succeeds, 2) convert the filename to internal text encoding.

In such cases, the candidate text encodings which are considered during
"auto-detection" are those listed in the INI parameter
mbstring.http_input. Further, mbstring.http_input is one of the few
contexts where mbstring allows the magic string "pass" to appear in
place of an actual text encoding name.

Before mbstring's encoding auto-detection function was reimplemented,
the old implementation would never return "pass", even if "pass" was the
only candidate it was given to choose from. It is not clear if this was
intended by the original developers or not. This behavior was the result
of some rather subtle details of the implementation.

After mbstring's auto-detection function was reimplemented, if the new
implementation was given only one candidate to choose, and it was not
running in 'strict' mode, it would always return that candidate, even
if the candidate was the non-encoding "pass".

The upshot of all of this: Previously, if
mbstring.encoding_translation=1 and mbstring.http_input=pass, encoding
conversion of RFC1867 filenames would never be attempted. But after
the reimplementation, encoding 'conversion' would occur (uselessly).

Further, in December 2022, I reimplemented the relevant bit of
encoding conversion code. When doing this, I never bothered to
implement encoding/decoding routines for the non-encoding "pass",
because I thought that they would never be used. Well, in the one case
described above, those routines *would* have been used, had they
actually existed. Because they didn't exist, we get a nice NULL pointer
dereference and ensuing segfault instead.

Instead of 'fixing' this by adding encoding/decoding routines for the
non-encoding "pass", I have modified the function which the RFC1867
form-handling code invokes to auto-detect input encoding. This function
will never return "pass" now, just like the previous implementation.

Thanks to the GitHub user 'tstangner' for reporting this bug.
2024-01-24 17:15:27 +02:00
..
libmbfl PHP_HAVE_BUILTIN_USUB_OVERFLOW macro is defined even if __builtin_usub_overflow not available 2023-10-23 14:05:48 +01:00
tests Fix segfault caused by use of 'pass' encoding when mbstring converts multipart form POST data 2024-01-24 17:15:27 +02:00
ucgendat Optimize mb_str{,im}width for performance 2021-09-29 18:19:01 +02:00
common_codepoints.txt Improve mb_detect_encoding accuracy for text containing vowels with macrons 2023-08-25 12:09:55 +02:00
config.m4 Combine CJK encoding conversion code in a single source file 2023-05-20 21:27:48 -07:00
config.w32 Combine CJK encoding conversion code in a single source file 2023-05-20 21:27:48 -07:00
CREDITS
gen_rare_cp_bitvec.php Mark globals as const (#10303) 2023-01-23 13:46:58 +00:00
mb_gpc.c Take order of candidate encodings into account when guessing text encoding 2023-05-16 07:01:07 -07:00
mb_gpc.h Remove unused 'to_language' and 'from_language' struct fields 2022-08-16 16:43:26 +02:00
mbstring.c Fix segfault caused by use of 'pass' encoding when mbstring converts multipart form POST data 2024-01-24 17:15:27 +02:00
mbstring.h Take order of candidate encodings into account when guessing text encoding 2023-05-16 07:01:07 -07:00
mbstring.stub.php Merge branch 'PHP-8.2' into PHP-8.3 2023-11-27 21:13:21 +02:00
mbstring_arginfo.h Merge branch 'PHP-8.2' into PHP-8.3 2023-11-27 21:13:21 +02:00
php_mbregex.c Reduce memory allocated by var_export, json_encode, serialize, and other (#8902) 2022-07-08 14:47:46 +02:00
php_mbregex.h Declare ext/mbstring constants in stubs (#8798) 2022-06-23 17:34:08 +02:00
php_onig_compat.h
php_unicode.c Implement conditional casing for Greek letter sigma when title-casing text 2023-01-12 17:41:11 +02:00
php_unicode.h Speed boost for mb_stripos (when not using UTF-8) 2022-12-18 15:31:20 +02:00
rare_cp_bitvec.h Improve mb_detect_encoding accuracy for text containing vowels with macrons 2023-08-25 12:09:55 +02:00
unicode_data.h Update Unicode tables to 14.0.0 2021-09-20 09:58:20 +02:00