Commit graph

87 commits

Author SHA1 Message Date
Gina Peter Banyard
fd2d869642
Clean-up some more headers (#14416)
Remove unused headers (such as php_ini.h for extensions that don't define INI settings)
Use more specific headers when possible
2024-06-08 17:15:36 +01:00
Gina Peter Banyard
25a5146180
Clean-up unused headers (#14365)
* ext/mbstring.c: clean-up headers and include intrinsics
2024-06-01 17:12:42 +01:00
Ilija Tovilo
cd66fcc68b
Add request_parse_body() function
RFC: https://wiki.php.net/rfc/rfc1867-non-post

This function allows populating the $_POST and $_FILES globals for non-post
requests. This avoids manual parsing of RFC1867 requests.

Fixes #55815
Closes GH-11472
2024-02-08 12:08:07 +01:00
Alex Dowad
3ab10da758 Take order of candidate encodings into account when guessing text encoding
The documentation for mb_detect_encoding says that this function
"Detects the most likely character encoding for string `string` from an
ordered list of candidates".

Prior to 28b346bc06, mb_detect_encoding did not really attempt to
determine the "most likely" text encoding for the input string. It
would just return the first candidate encoding for which the string was
valid. In 28b346bc06, I amended this function so that it uses heuristics
to try to guess which candidate encoding is "most likely".

However, the caller did not have any way to indicate which candidate
text encoding(s) they consider to be more likely, in case the
heuristics applied are inconclusive. In the language of Bayesian
probability, there was no way for the caller to indicate their 'prior'
assignment of probabilities.

Further, the documentation for mb_detect_encoding also says that the
second parameter `encodings` is "a list of character encodings to try,
in order". The documentation clearly implies that the order of
the `encodings` argument should be significant.

Therefore, amend mb_detect_encoding so that while it still uses
heuristics to guess the most likely text encoding for the input string,
it favors those which are earlier in the list of candidate encodings.

One complication is that many callers of mb_detect_encoding use it
in this way:

    mb_detect_encoding($string, mb_list_encodings());

In a majority of cases, this is bad code; mb_detect_encoding will both
be much slower and the results will be less reliable than if a smaller
list of candidates is used. However, since such code already exists and
people are using it in production, we should not unnecessarily break it.
The order of candidate encodings obviously does not express any prior
belief of which candidates are more likely in this case, and treating
it as if it did will degrade the accuracy of the result.

Since mb_list_encodings now returns a single, immutable array on each
call, we can avoid that problem by turning off the new behavior when
we receive the array of encodings returned by mb_list_encodings.
This implementation means that if the user does this:

    $a = mb_list_encodings();
    mb_detect_encoding($string, $a);

...then the order of candidate encodings will not be considered.
However, if the user explicitly initializes their own array of all
supported legacy text encodings, then the order *will* be considered.

The other functions which also follow this new behavior are:

• mb_convert_variables
• mb_convert_encoding (when multiple candidate input encodings are
  listed)

Other places where "detection" (or really "guessing") of text encoding
may be performed include:

• mb_send_mail
• Zend engine, when determining the encoding of a PHP script
• mbstring processing of HTTP request contents, when http_input INI
  parameter is set to a list

In these cases, the new logic based on order of candidate encodings
is *not* enabled. It *might* be logical to consider the order of
candidate encodings in some or all of these cases, but I'm not sure if
that is true, so it seems wiser to avoid more behavior changes than is
necessary. Further, ever since the new encoding detection heuristics
were implemented in 28b346bc06, we have not received any complaints of
user code being broken in these areas. So I am reluctant to "fix what
isn't broken".

Well, some might say that applying the new detection heuristics
to mb_send_mail, etc. in 28b346bc06 was "fixing what wasn't broken",
but (cough cough) I don't have any comment on that...
2023-05-16 07:01:07 -07:00
Alex Dowad
6df7557e43 mb_parse_str, mb_http_input, and mb_convert_variables use fast text conversion code for automatic encoding detection
For mb_parse_str, when mbstring.http_input (INI parameter) is a list of
multiple possible text encodings (which is not the case by default),
this new implementation is about 25% faster.

When mbstring.http_input is a single value, then nothing is changed.
(No automatic encoding detection is done in that case.)
2023-04-12 19:57:52 +02:00
Alex Dowad
8df515555b Remove unused 'to_language' and 'from_language' struct fields 2022-08-16 16:43:26 +02:00
Alex Dowad
aeccb139c3 Use new encoding conversion filters for mb_parse_str and php_mb_post_handler
When micro-benchmarking on relatively short ASCII strings, the new
implementation was about 30% faster than the old one.
2022-08-16 16:43:26 +02:00
KsaR
01b3fc03c3
Update http->https in license (#6945)
1. Update: http://www.php.net/license/3_01.txt to https, as there is anyway server header "Location:" to https.
2. Update few license 3.0 to 3.01 as 3.0 states "php 5.1.1, 4.1.1, and earlier".
3. In some license comments is "at through the world-wide-web" while most is without "at", so deleted.
4. fixed indentation in some files before |
2021-05-06 12:16:35 +02:00
Alex Dowad
7eddcabe2b Don't guard mbstring code with #ifdef HAVE_MBSTRING
This is just a very silly feature of mbstring -- you can compile the source files with
HAVE_MBSTRING undefined, and it will all just compile to (almost) nothing. What is the
use of this? Why compile the source files and link against them if you don't want the
mbstring extension? It doesn't make any kind of sense.
2020-08-31 23:18:13 +02:00
Alex Dowad
62317d592f Remove redundant includes from mbstring (and make sure correct config.h is used)
Very interesting... it turns out that when Valgrind support was enabled,
`#include "config.h"` from within mbstring was actually including the file "config.h"
from Valgrind, and not the one from mbstring!!

This is because -I/usr/include/valgrind was added to the compiler invocation _before_
-Iext/mbstring/libmbfl.

Make sure we actually include the file which was intended.
2020-08-31 23:17:58 +02:00
George Peter Banyard
68164f40ce Fix [-Wundef] warning in MBString extension 2020-05-16 15:31:20 +02:00
Nikita Popov
217f6013b3 Remove no_language from mbfl_string
This is not actually used for anything and just causes confusion.
2020-05-07 11:36:57 +02:00
Gabriel Caruso
5d6e923d46
Remove mention of PHP major version in Copyright headers
Closes GH-4732.
2019-09-25 14:51:43 +02:00
Nikita Popov
f73f190c3f Fix internal_encoding fallback in mbstring
By introducing a hook that is called whenever one of
internal_encoding / input_encoding / output_encoding changes, so
that mbstring can adjust it's internal state.

This also makes internal_encoding work with zend multibyte.
2019-04-17 14:05:53 +02:00
Peter Kokot
92ac598aab Remove local variables
This patch removes the so called local variables defined per
file basis for certain editors to properly show tab width, and
similar settings. These are mainly used by Vim and Emacs editors
yet with recent changes the once working definitions don't work
anymore in Vim without custom plugins or additional configuration.
Neither are these settings synced across the PHP code base.

A simpler and better approach is EditorConfig and fixing code
using some code style fixing tools in the future instead.

This patch also removes the so called modelines for Vim. Modelines
allow Vim editor specifically to set some editor configuration such as
syntax highlighting, indentation style and tab width to be set in the
first line or the last 5 lines per file basis. Since the php test
files have syntax highlighting already set in most editors properly and
EditorConfig takes care of the indentation settings, this patch removes
these as well for the Vim 6.0 and newer versions.

With the removal of local variables for certain editors such as
Emacs and Vim, the footer is also probably not needed anymore when
creating extensions using ext_skel.php script.

Additionally, Vim modelines for setting php syntax and some editor
settings has been removed from some *.phpt files.  All these are
mostly not relevant for phpt files neither work properly in the
middle of the file.
2019-02-03 21:03:00 +01:00
Zeev Suraski
0cf7de1c70 Remove yearly range from copyright notice 2019-01-30 11:03:12 +02:00
Khan M Rashedun-Naby
36ae074036 Simplify mb_gpc() code
Use a switch (consistent with other places in this file) and also
don't unnecessarily set free_buffer.
2018-10-20 22:30:21 +02:00
Peter Kokot
1ad08256f3 Sync leading and final newlines in source code files
This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines.

According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.

C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."

Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
2018-10-14 12:56:38 +02:00
Peter Kokot
8d3f8ca12a Remove unused Git attributes ident
The $Id$ keywords were used in Subversion where they can be substituted
with filename, last revision number change, last changed date, and last
user who changed it.

In Git this functionality is different and can be done with Git attribute
ident. These need to be defined manually for each file in the
.gitattributes file and are afterwards replaced with 40-character
hexadecimal blob object name which is based only on the particular file
contents.

This patch simplifies handling of $Id$ keywords by removing them since
they are not used anymore.
2018-07-25 00:53:25 +02:00
Dmitry Stogov
5eb1f92f31 Use zend_string_release_ex() instread of zend_string_release() in places, where we sure about string persistence. 2018-05-28 16:27:12 +03:00
Anatol Belski
0bc4cf901c Fix unsigned comparisons 2018-02-17 13:02:50 +01:00
Xinchen Hui
a6519d0514 year++ 2018-01-02 12:57:58 +08:00
Anatol Belski
98fe82cc05 fix data types 2017-07-25 21:26:25 +02:00
Nikita Popov
b3c1d9d111 Directly use encodings instead of no_encoding in libmbfl
In particular strings now store encoding rather than the
no_encoding.

I've also pruned out libmbfl APIs that existed in two forms, one
using no_encoding and the other using encoding. We were not actually
using any of the former.
2017-07-20 21:41:52 +02:00
Sammy Kaye Powers
9e29f841ce Update copyright headers to 2017 2017-01-02 09:30:12 -06:00
Dmitry Stogov
1616038698 Added ZEND_ATTRIBUTE_FORMAT to some middind functions.
"%p" replaced by ZEND_LONG_FMT to avoid compilation warnings.
Fixed most incorrect use cases of format specifiers.
2016-06-21 16:00:37 +03:00
Lior Kaplan
ed35de784f Merge branch 'PHP-5.6' into PHP-7.0
* PHP-5.6:
  Happy new year (Update copyright to 2016)
2016-01-01 19:48:25 +02:00
Lior Kaplan
49493a2dcf Happy new year (Update copyright to 2016) 2016-01-01 19:21:47 +02:00
Anatol Belski
2283551cfa fix NULL deref in mbstring post handler 2015-11-05 22:24:47 +01:00
Dmitry Stogov
4a2e40bb86 Use ZSTR_ API to access zend_string elements (this is just renaming without semantick changes). 2015-06-30 04:05:24 +03:00
Xinchen Hui
fc33f52d8c bump year 2015-01-15 23:27:30 +08:00
Xinchen Hui
0579e8278d bump year 2015-01-15 23:26:37 +08:00
Stanislav Malyshev
b7a7b1a624 trailing whitespace removal 2015-01-10 15:07:38 -08:00
Anatol Belski
bdeb220f48 first shot remove TSRMLS_* things 2014-12-13 23:06:14 +01:00
Johannes Schlüter
d0cb715373 s/PHP 5/PHP 7/ 2014-09-19 18:33:14 +02:00
Anatol Belski
4d997f63d9 master renames phase 3 2014-08-25 20:22:49 +02:00
Anatol Belski
c3e3c98ec6 master renames phase 1 2014-08-25 19:24:55 +02:00
Anatol Belski
70de6180d5 fixes to %pd format usage 2014-08-24 02:35:34 +02:00
Anatol Belski
f27c52d846 fixed incompatible types usage 2014-08-20 09:15:00 +02:00
Lior Kaplan
741605da73 Merge branch 'PHP-5.6'
* PHP-5.6:
  Correct typo in comments: 'initialized'

Conflicts:
	ext/dom/php_dom.c
	ext/spl/php_spl.c
2014-08-17 21:37:22 +03:00
Lior Kaplan
f1d0e50ea8 Merge branch 'PHP-5.5' into PHP-5.6
* PHP-5.5:
  Correct typo in comments: 'initialized'
2014-08-17 21:34:03 +03:00
Lior Kaplan
1504f7d630 Correct typo in comments: 'initialized' 2014-08-17 21:32:53 +03:00
Xinchen Hui
946269e48e Refactor mbstring (incompleted) 2014-03-23 20:04:58 +08:00
Xinchen Hui
c081ce628f Bump year 2014-01-03 11:08:10 +08:00
Xinchen Hui
c0d060f5c0 Bump year 2014-01-03 11:04:26 +08:00
Michael Wallner
2438490add slim post data 2013-08-27 13:31:35 +02:00
Xinchen Hui
a666285bc2 Happy New Year 2013-01-01 16:37:09 +08:00
Xinchen Hui
0a7395e009 Happy New Year 2013-01-01 16:28:54 +08:00
Xinchen Hui
e222837f6e Merge branch 'PHP-5.3' into PHP-5.4 2012-11-07 17:08:34 +08:00
Xinchen Hui
7fcbe4d546 Fixed bug #63447 (max_input_vars doesn't filter variables when mbstring.encoding_translation = On) 2012-11-07 17:05:24 +08:00