Almost every character set can be given a number N such that a multibyte
sequence starts with a byte higher than that number N. This allows us to
skip a lot of work. To ensure the correctness of this, a sanity check is
implemented that exhaustively tries every 4-byte sequence for every
character set and checks for consistency issues.
This finally gives:
Time (mean ± σ): 120.2 ms ± 1.2 ms [User: 116.9 ms, System: 2.8 ms]
Range (min … max): 118.0 ms … 122.9 ms 24 runs
We allocate twice the input length, and every input character results in
either 1 or 2 output bytes, so we cannot overflow.
By using an enum, and a switch table (which will be efficiently compiled
into a jump table), we can avoid the pessimistic code generation of the
indirect calls.
With this I get the following runtime for the test script in GH-13466 on
my i7-4790, which is around 1.25x faster.
Time (mean ± σ): 250.9 ms ± 1.6 ms [User: 248.4 ms, System: 2.0 ms]
Range (min … max): 248.9 ms … 254.4 ms 11 runs
1. Update: http://www.php.net/license/3_01.txt to https, as there is anyway server header "Location:" to https.
2. Update few license 3.0 to 3.01 as 3.0 states "php 5.1.1, 4.1.1, and earlier".
3. In some license comments is "at through the world-wide-web" while most is without "at", so deleted.
4. fixed indentation in some files before |
We're starting to see a mix between uses of zend_bool and bool.
Replace all usages with the standard bool type everywhere.
Of course, zend_bool is retained as an alias.
The AC_CHECK_TYPE was refactored in more recent versions of Autoconf
and the call with two arguments is obsolete and not recommended anymore.
This patch also refactors some leftovers of using ulong and uint which
are not standard nor common usages of types in C.
The ulong can be used as zend_ulong and uint usage is actually
`unsigned int`.
The usage of HAVE_ULONG removed since it is not used in current code
base.
Legacy edgecase for some legacy HPUX systems removed:
- sys/stream.h header is not checked and the HAVE_SYS_STREAM_H is
not defined with current build system.
- flags are unsigned int
- max_allowed_packet changed to unsigned int
This patch follows previous license year ranges updates. With new
approach source code files now have simplified headers with license
information without year ranges.
This patch removes the so called local variables defined per
file basis for certain editors to properly show tab width, and
similar settings. These are mainly used by Vim and Emacs editors
yet with recent changes the once working definitions don't work
anymore in Vim without custom plugins or additional configuration.
Neither are these settings synced across the PHP code base.
A simpler and better approach is EditorConfig and fixing code
using some code style fixing tools in the future instead.
This patch also removes the so called modelines for Vim. Modelines
allow Vim editor specifically to set some editor configuration such as
syntax highlighting, indentation style and tab width to be set in the
first line or the last 5 lines per file basis. Since the php test
files have syntax highlighting already set in most editors properly and
EditorConfig takes care of the indentation settings, this patch removes
these as well for the Vim 6.0 and newer versions.
With the removal of local variables for certain editors such as
Emacs and Vim, the footer is also probably not needed anymore when
creating extensions using ext_skel.php script.
Additionally, Vim modelines for setting php syntax and some editor
settings has been removed from some *.phpt files. All these are
mostly not relevant for phpt files neither work properly in the
middle of the file.
I fixed the stuff that seemed obviously wrong, but there are some more
differences with the SHOW COLLATIONS output:
* The whole range of "utf16" collations missing
* "filename" missing
* "ucs2_general_mysql500_ci" and "utf8_general_mysql500_ci" missing
I wasn't sure whether those omissions are intentional, so I didn't add
them.