Commit graph

966 commits

Author SHA1 Message Date
Niels Dossche
be2889411a
Merge branch 'PHP-8.4'
* PHP-8.4:
  Fix GH-19397: mb_list_encodings() can cause crashes on shutdown
2025-08-08 20:33:00 +02:00
Niels Dossche
db3f6d0bf0
Merge branch 'PHP-8.3' into PHP-8.4
* PHP-8.3:
  Fix GH-19397: mb_list_encodings() can cause crashes on shutdown
2025-08-08 20:32:55 +02:00
Niels Dossche
cc93bbb765
Fix GH-19397: mb_list_encodings() can cause crashes on shutdown
The request shutdown does not necessarily hold the last reference, if
there is still a CV that refers to the array.

Closes GH-19405.
2025-08-08 20:32:29 +02:00
Gina Peter Banyard
105c1e9896
tree: use zend_str_has_nul_byte() API (#19336) 2025-07-31 23:57:27 +01:00
DanielEScherzer
07f1cfd9b0
Deprecate producing output in a user output handler (#19067)
https://wiki.php.net/rfc/deprecations_php_8_4
2025-07-09 21:20:58 -07:00
Niels Dossche
be17e9ed54
Merge branch 'PHP-8.4'
* PHP-8.4:
  Fix handling of references in zval_try_get_long()
2025-06-04 21:00:22 +02:00
Niels Dossche
2b383848a7
Fix handling of references in zval_try_get_long()
This API can't handle references, yet everyone keeps forgetting that it
can't and that you should DEREF upfront. Fix every type of this issue
once and for all by moving the reference handling to this Zend API.

Closes GH-18761.
2025-06-04 21:00:05 +02:00
Saki Takamachi
462fd4dffe
Small change SIMD codes (#18626)
* use zend_simd.h in zend_accelerator_util_funcs.c

* use zend_simd.h in mbstring

* Remove unnecessary SSE3 includes
2025-05-26 16:32:27 +09:00
Niels Dossche
33ae76405f Use zend_string for arg_separators
This allows us to avoid a call to `zend_ini_str` which took 6% of the
profile on my i7-4790 for a call to `http_build_query`. Now we can just
grab the value from the globals.
In other files this can avoid some length recomputations.
2025-05-21 19:54:09 +02:00
Niels Dossche
33c4ca36e4
Merge branch 'PHP-8.4'
* PHP-8.4:
  Fix weird unpack behaviour in DOM
  Fix GH-17989: mb_output_handler crash with unset http_output_conv_mimetypes
2025-03-09 11:21:34 +01:00
Niels Dossche
aa6e58f82a
Merge branch 'PHP-8.3' into PHP-8.4
* PHP-8.3:
  Fix weird unpack behaviour in DOM
  Fix GH-17989: mb_output_handler crash with unset http_output_conv_mimetypes
2025-03-09 11:21:27 +01:00
Niels Dossche
c7d3dc6fab
Fix GH-17989: mb_output_handler crash with unset http_output_conv_mimetypes
The INI option can be NULL or invalid, resulting in a NULL global.
So we have to add a NULL check.

Closes GH-17996.
2025-03-09 11:16:33 +01:00
Niels Dossche
b1841fdfa2
Avoid unnecessary string refcounting in ext/mbstring (#17892) 2025-02-23 00:23:53 +01:00
Christoph M. Becker
61f42f2d4e
Merge branch 'PHP-8.4'
* PHP-8.4:
  Fix GH-17503: Undefined float conversion in mb_convert_variables
2025-02-04 15:55:06 +01:00
Christoph M. Becker
47a0922dee
Merge branch 'PHP-8.3' into PHP-8.4
* PHP-8.3:
  Fix GH-17503: Undefined float conversion in mb_convert_variables
2025-02-04 15:53:24 +01:00
Christoph M. Becker
55e676e181
Fix GH-17503: Undefined float conversion in mb_convert_variables
Conversion of floating point to integer values is undefined if the
integral part of the float value cannot be represented by the integer
type.  We need to cater to that explicitly (in a manner similar to
`zend_dval_to_lval_cap()`).

Closes GH-17689.
2025-02-04 15:51:48 +01:00
Christoph M. Becker
36b8a69009
Drop superfluous enum forward declaration (GH-17455)
Besides that it is not needed, it is not proper C, and Clang warns that
"forward references to 'enum' types are a Microsoft extension"
(`-Wmicrosoft-enum-forward-reference`).
2025-01-17 11:45:58 +01:00
Christoph M. Becker
6e759e079f
Resolve some MSVC C4244 level 2 warnings
These got already approval by the respective code owners in GH-17076.
2024-12-11 00:12:13 +01:00
David Carlier
f47a45ecff
Merge branch 'PHP-8.3' into PHP-8.4 2024-10-11 08:49:00 +01:00
David Carlier
89b4f94024
Merge branch 'PHP-8.2' into PHP-8.3 2024-10-11 08:48:49 +01:00
David Carlier
c34d4fbbf4
Fix GH-16360 mb_substr overflow on start and length arguments.
occurs when they are negated to start working from the end instead
when set with ZEND_LONG_MIN.
2024-10-11 08:46:48 +01:00
Niels Dossche
07e418abfb
Merge branch 'PHP-8.3' into PHP-8.4
* PHP-8.3:
  Fix GH-16261: Reference invariant broken in mb_convert_variables()
2024-10-07 17:49:56 +02:00
Niels Dossche
2fe8c4a4fc
Merge branch 'PHP-8.2' into PHP-8.3
* PHP-8.2:
  Fix GH-16261: Reference invariant broken in mb_convert_variables()
2024-10-07 17:49:24 +02:00
Niels Dossche
bf70d9ba0d
Fix GH-16261: Reference invariant broken in mb_convert_variables()
The behaviour is weird in the sense that the reference must get
unwrapped. What ended up happening is that when destroying the old
reference the sources list was not cleaned properly. We add handling for
that. Normally we would use use ZEND_TRY_ASSIGN_STRINGL but that doesn't
work here as it would keep the reference and change values through
references (see bug #26639).

Closes GH-16272.
2024-10-07 17:46:06 +02:00
Yuya Hamada
f815310c98 Merge branch 'PHP-8.3' into PHP-8.4 2024-10-05 18:28:43 +09:00
Yuya Hamada
4e23d3945a Merge branch 'PHP-8.2' into PHP-8.3 2024-10-05 18:26:25 +09:00
Yuya Hamada
d840200cea Fix GH-16229: Address overflowed in mb_send_mail when empty string 2024-10-05 18:24:09 +09:00
Gina Peter Bnayard
5853cdb73d Use "must not" instead of "cannot" wording 2024-08-21 21:12:17 +01:00
Gina Peter Bnayard
9a2fdbec48 ext/mbstring: Use standard wording for ValueError 2024-08-21 21:12:17 +01:00
Gina Peter Banyard
25a5146180
Clean-up unused headers (#14365)
* ext/mbstring.c: clean-up headers and include intrinsics
2024-06-01 17:12:42 +01:00
Gina Peter Banyard
48d5ae98e7
ext/standard: Refactor exec.c public APIs to use zend_string pointers (#14353)
* Pull zend_string* from INI directive

* Ensure that mail.force_extra_parameters INI directive does not have any nul bytes

* ext/standard: Make php_escape_shell_cmd() take a zend_string* instead of char*

This saves on an expensive strlen() computation

* Convert E_ERROR to ValueError in php_escape_shell_cmd()

* ext/standard: Make php_escape_shell_arg() take a zend_string* instead of char*

This saves on an expensive strlen() computation

* Convert E_ERROR to ValueError in php_escape_shell_arg()
2024-05-29 10:59:17 +01:00
Gina Peter Banyard
c96e8946e1 ext/mbstring: Check encoding passed to mb_http_output() has no null bytes 2024-04-24 15:39:47 +01:00
Gina Peter Banyard
86dfbadc06 ext/mbstring: Always pass length to php_mb_get_encoding_or_pass()
We have access to this information, so propagate it instead of calling strlen().
This also removes the newly introduced _ex() variant.
2024-04-24 15:39:47 +01:00
Gina Peter Banyard
b61479bb28 ext/mbstring: Pass string length to _php_mb_ini_mbstring_http_output_set() 2024-04-24 15:39:47 +01:00
Niels Dossche
f81370847c
Fix GH-13815: mb_trim() inaccurate $characters default value (#13820)
Because the default characters are defined in the stub file, and the
stub file is UTF-8 (typically), the characters are encoded in the string
as UTF-8. When using a different character encoding, there is a mismatch
between what mb_trim expects and the UTF-8 encoded string it gets.

One way of solving this is by making the characters argument nullable,
which would mean that it always uses the internal code path that has the
unicode codepoints that are defaulted actually stored as codepoint
numbers instead of in a string.

Co-authored-by: @ranvis
2024-04-24 09:07:55 +02:00
David Carlier
39c67e37db
Merge branch 'PHP-8.3' 2024-04-10 10:01:57 +01:00
David Carlier
2cfd9df109
Fix GH-13932: Attempt to fix mbstring on windows build (msvc).
Build failure due to lack of VLA support in older compiler versions.
2024-04-10 10:01:11 +01:00
Ben Ramsey
7ca4300db8
Merge branch 'PHP-8.3' 2024-04-09 23:55:11 -05:00
Alex Dowad
3394efc63e
Fix infinite loop in mb_encode_mimeheader 2024-04-09 23:52:11 -05:00
tekimen
4d51bfa270
[RFC] Add mb_ucfirst and mb_lcfirst functions (#13161) 2024-03-20 17:25:19 +01:00
Niels Dossche
8128d17cc2 Fix build warning in ext/mbstring 2024-01-24 19:44:17 +01:00
Alex Dowad
bcd4138185 Merge branch 'PHP-8.3'
* PHP-8.3:
  Fix segfault caused by use of 'pass' encoding when mbstring converts multipart form POST data
2024-01-24 18:41:36 +02:00
Alex Dowad
67051eb8ed Fix segfault caused by use of 'pass' encoding when mbstring converts multipart form POST data
When mbstring.encoding_translation=1, and PHP receives an (RFC1867)
form-based file upload, and the Content-Disposition HTTP header contains
a filename for the uploaded file, PHP will internally invoke mbstring
code to 1) try to auto-detect the text encoding of the filename, and if
that succeeds, 2) convert the filename to internal text encoding.

In such cases, the candidate text encodings which are considered during
"auto-detection" are those listed in the INI parameter
mbstring.http_input. Further, mbstring.http_input is one of the few
contexts where mbstring allows the magic string "pass" to appear in
place of an actual text encoding name.

Before mbstring's encoding auto-detection function was reimplemented,
the old implementation would never return "pass", even if "pass" was the
only candidate it was given to choose from. It is not clear if this was
intended by the original developers or not. This behavior was the result
of some rather subtle details of the implementation.

After mbstring's auto-detection function was reimplemented, if the new
implementation was given only one candidate to choose, and it was not
running in 'strict' mode, it would always return that candidate, even
if the candidate was the non-encoding "pass".

The upshot of all of this: Previously, if
mbstring.encoding_translation=1 and mbstring.http_input=pass, encoding
conversion of RFC1867 filenames would never be attempted. But after
the reimplementation, encoding 'conversion' would occur (uselessly).

Further, in December 2022, I reimplemented the relevant bit of
encoding conversion code. When doing this, I never bothered to
implement encoding/decoding routines for the non-encoding "pass",
because I thought that they would never be used. Well, in the one case
described above, those routines *would* have been used, had they
actually existed. Because they didn't exist, we get a nice NULL pointer
dereference and ensuing segfault instead.

Instead of 'fixing' this by adding encoding/decoding routines for the
non-encoding "pass", I have modified the function which the RFC1867
form-handling code invokes to auto-detect input encoding. This function
will never return "pass" now, just like the previous implementation.

Thanks to the GitHub user 'tstangner' for reporting this bug.
2024-01-24 17:15:27 +02:00
Alex Dowad
6fa4286ba4 Merge branch 'PHP-8.3'
* PHP-8.3:
  Do not allow zend.script_encoding to be set to 'pass'
2024-01-21 15:14:29 +02:00
Alex Dowad
1e92d47f41 Do not allow zend.script_encoding to be set to 'pass'
When investigating another bug reported by GitHub user 'tstangner',
I discovered that PHP segfaults when the INI parameter
zend.script_encoding is set to "pass". This bug dates back to
December 2022 (caused by yours truly in 953864661a).

If any PHP users in the wild were actually setting zend.script_encoding
to "pass" (which would be an utterly useless thing to do), I expect that
someone would have filed a bug report by now. The absence of such bug
reports is evidence that nobody is doing this.

Hence, it seems that the best fix is simply to disallow "pass" as a
choice for zend.script_encoding. The internal function
'php_mb_zend_encoding_list_parser' which I am modifying to accomplish
this has no other in-tree callers, aside from the 'exif' extension.
Further, exif only calls the function with a few hard-coded values, and
none of them are the string "pass", so this change will not have any
impact on exif.
2024-01-21 14:51:54 +02:00
Alex Dowad
bb6ceec230 Merge branch 'PHP-8.3'
* PHP-8.3:
  Fix bug in mb_get_substr_slow (sometimes outputs wrong number of characters)
2023-12-21 09:35:01 +02:00
Cristian Rodríguez
927adfb1a6
Use a single version of mempcpy(3) (#12257)
While __php_mempcpy is only used by ext/standard/crypt_sha*, the
mempcpy "pattern" is used everywhere.

This commit removes __php_mempcpy, adds zend_mempcpy and transforms
open-coded parts into function calls.
2023-12-20 15:16:32 +00:00
Alex Dowad
e814197371 Fix bug in mb_get_substr_slow (sometimes outputs wrong number of characters)
Thanks to Maurício Fauth for finding and reporting this bug.

The bug was introduced in October 2022. It originally only affected
text encodings which do not have a fixed byte width per characters
and for which mbstring does not have an mblen_table. However, I recently
made another change to mbstring, such that mb_substr no longer relies
on the mblen_table even if one is available. Because of this change,
the bug earlier introduced in October 2022 now affected a greater
number of text encodings, including UTF-8.
2023-12-20 14:32:53 +02:00
Alex Dowad
cffdeb81d5 Add specialized implementation of mb_strcut for GB18030
For GB18030, it is not generally possible to identify character
boundaries without scanning through the entire string. Therefore,
implement mb_strcut using a similar strategy as the mblen_table based
implementation in mbstring.c. The difference is that for GB18030, we
need to look at two leading bytes to determine the byte length of a
multi-byte character.

The new implementation is 4-5x faster for short strings, and more than
10x faster for long strings. (Part of the reason why this new code has
such a great performance advantage is because it is replacing code
based on the older text conversion filters provided by libmbfl, which
were quite slow.)

The behavior is the same as before for valid GB18030 strings; for
some invalid strings, mb_strcut will choose different 'cut' points
as compared to before. (Clang's libFuzzer was used to compare the
old and new implementations, searching for test cases where they had
different behavior; no such cases were found.)
2023-12-18 17:01:20 +02:00
Gina Peter Banyard
6da8b93ed5
ext/mbstring: Refactor mb_get_info() 2023-12-18 00:31:29 +00:00