mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.
It would run over the string and set a 'flag' if it saw anything which
did not appear likely to be the encoding in question.
One problem with this scheme was that encodings which merely appeared
less likely to be the correct one were completely rejected, even if there
was no better candidate. Another problem was that the 'identify filters'
had a huge amount of code duplication with the 'conversion filters'.
Eliminate the identify filters. Instead, when auto-detecting text
encoding, use conversion filters to see whether the input string is valid
in candidate encodings or not. At the same type, watch the type of
codepoints which the string decodes to and mark it as less likely if
non-printable characters (ESC, form feed, bell, etc.) or 'private use
area' codepoints are seen.
Interestingly, one old test case in which JIS text was misidentified
as UTF-8 (and this wrong behavior was enshrined in the test) was 'fixed'
and the JIS string is now auto-detected as JIS.
There is no meaningful difference between these and UCS-{2,4}. They are
just a little bit more lax about passing errors silently. They also have
no known use.
Alias to UCS-{2,4} in case someone, somewhere is using them.
Very interesting... it turns out that when Valgrind support was enabled,
`#include "config.h"` from within mbstring was actually including the file "config.h"
from Valgrind, and not the one from mbstring!!
This is because -I/usr/include/valgrind was added to the compiler invocation _before_
-Iext/mbstring/libmbfl.
Make sure we actually include the file which was intended.
All memory allocation and deallocation for mbstring bounces through a table of
function pointers before going to emalloc/efree/etc. But this is unnecessary.
The allocators are never swapped out. Better to just call them directly.
Normalization include:
- Use dnl for everything that can be ommitted when configure is built in
favor of the shell comment character # which is visible in the output.
- Line length normalized to 80 columns
- Dots for most of the one line sentences
- Macro definitions include similar pattern header comments now
This patch removes the so called local variables defined per
file basis for certain editors to properly show tab width, and
similar settings. These are mainly used by Vim and Emacs editors
yet with recent changes the once working definitions don't work
anymore in Vim without custom plugins or additional configuration.
Neither are these settings synced across the PHP code base.
A simpler and better approach is EditorConfig and fixing code
using some code style fixing tools in the future instead.
This patch also removes the so called modelines for Vim. Modelines
allow Vim editor specifically to set some editor configuration such as
syntax highlighting, indentation style and tab width to be set in the
first line or the last 5 lines per file basis. Since the php test
files have syntax highlighting already set in most editors properly and
EditorConfig takes care of the indentation settings, this patch removes
these as well for the Vim 6.0 and newer versions.
With the removal of local variables for certain editors such as
Emacs and Vim, the footer is also probably not needed anymore when
creating extensions using ext_skel.php script.
Additionally, Vim modelines for setting php syntax and some editor
settings has been removed from some *.phpt files. All these are
mostly not relevant for phpt files neither work properly in the
middle of the file.
The C89 standard and later defines the `<string.h>` header as part of
the standard headers [1] and on current systems it is always present.
Code included also `<strings.h>` header as an alterinative in some
files. This kind of check was relevant on some older systems where the
`<strings.h>` file included definitions for the C89 compliant
`<string.h>`. Today such alternative check is not required anymore. The
`<strings.h>` file is part of the POSIX definition these days.
Also Autoconf suggests doing this and relying on C89 or above [2] and [3].
This patch also cleans few unused `<strings.h>` inclusions in the libmbfl.
[1]: https://port70.net/~nsz/c/c89/c89-draft.html#4.1.2
[2]: http://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/headers.m4
[3]: https://www.gnu.org/software/autoconf/manual/autoconf-2.69/autoconf.html
Autoconf 2.59d (released in 2006) [1] started promoting several macros
as not relevant for newer systems anymore, including the `AC_FUNC_MEMCMP`.
On some old systems such as SunOS 4.1.3 (EOL in 2003) and NeXT x86
OpenStep (discontinued) the `memcmp` function wasn't present or it
didn't work properly. [2]
On current systems including at least Solaris 10+ this check is not
relevant anymore.
Refs:
[1] http://git.savannah.gnu.org/cgit/autoconf.git/tree/NEWS
[2] https://www.gnu.org/software/autoconf/manual/autoconf-2.69/autoconf.html
Autoconf 2.59d (released in 2006) [1] started promoting several macros
as not relevant for newer systems anymore, including the `AC_HEADER_TIME`.
This macro checks if both `<sys/time.h>` and `<time.h>` can be included
at the same time and defines the `TIME_WITH_SYS_TIME` and
`HAVE_SYS_TIME_H` symbols. On current system such check is not relevant
anymore because in case both headers are present both can be also
included at the same time.
This patch simplifies this checking.
Refs:
[1] http://git.savannah.gnu.org/cgit/autoconf.git/tree/NEWS
[2] https://www.gnu.org/software/autoconf/manual/autoconf-2.69/autoconf.html
Autoconf 2.59d (released in 2006) [1] started promoting several macros
as not relevant for newer systems, including the `AC_C_CONST`.
The `const` keyword is used in C since C89. On old systems some compilers
lacked the `const` and this macro defined it to be empty. This check was
relevant on systems with compilers before C89 and on current systems it
can be omitted. [2]
PHP also requires at least C89 so `const` is always available.
Refs:
[1] http://git.savannah.gnu.org/cgit/autoconf.git/tree/NEWS
[2] https://www.gnu.org/software/autoconf/manual/autoconf-2.69/autoconf.html
Autoconf 2.50 released in 2001 made several macros obsolete including
the AC_TRY_RUN, AC_TRY_COMPILE and AC_TRY_LINK:
http://git.savannah.gnu.org/cgit/autoconf.git/tree/ChangeLog.2
These macros should be replaced with the current AC_FOO_IFELSE instead:
- AC_TRY_RUN with AC_RUN_IFELSE and AC_LANG_SOURCE
- AC_TRY_LINK with AC_LINK_IFELSE and AC_LANG_PROGRAM
- AC_TRY_COMPILE with AC_COMPILE_IFELSE and AC_LANG_PROGRAM
PHP 5.4 to 7.1 require Autoconf 2.59+ version, PHP 7.2 and above require
2.64+ version, and the PHP 7.2 phpize script requires 2.59+ version which
are all greater than above mentioned 2.50 version therefore systems
should be well supported by now.
This patch was created with the help of autoupdate script:
autoupdate <file>
Reference docs:
- https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Obsolete-Macros.html
- https://www.gnu.org/software/autoconf/manual/autoconf-2.59/autoconf.pdf
The $Id$ keywords were used in Subversion where they can be substituted
with filename, last revision number change, last changed date, and last
user who changed it.
In Git this functionality is different and can be done with Git attribute
ident. These need to be defined manually for each file in the
.gitattributes file and are afterwards replaced with 40-character
hexadecimal blob object name which is based only on the particular file
contents.
This patch simplifies handling of $Id$ keywords by removing them since
they are not used anymore.
The bundled libmbfl library is no longer API or ABI compatible with
the (currently unmaintained) upstream library. As such, building
against an external libmbfl is no longer possible.
If this does not break the Unix system somehow, I'll be amazed. This should get most of it out, apologies for any errors this may cause on non-Windows ends which I cannot test atm.
to link to the external oniguruma library.
- Prevent libmbfl files from being installed when --with-libmbfl is specified.
- Fixed mb_ereg_replace() to work with unicode strings.