These cause cache misses due to global access, in phpstan
(notably the array_map).
Initializing these isn't necessary because ZPP initializes it for us.
Only for optional arguments do we need to be careful; for `array_filter`
we still reset the `fci` but not `fci_cache` because `fci` is not
necessarily set by ZPP but is conditionally used to access `fci_cache`.
For this script:
```php
for ($i=0;$i < 100; $i++)
array_reduce(range(1, 100000), fn ($a,$b)=>$a+$b,1);
```
On an i7-4790:
```
Benchmark 1: ./sapi/cli/php reduce_bench.php
Time (mean ± σ): 272.0 ms ± 3.7 ms [User: 268.9 ms, System: 2.1 ms]
Range (min … max): 268.9 ms … 281.3 ms 11 runs
Benchmark 2: ./sapi/cli/php_old reduce_bench.php
Time (mean ± σ): 288.2 ms ± 3.5 ms [User: 284.5 ms, System: 2.7 ms]
Range (min … max): 285.0 ms … 295.9 ms 10 runs
Summary
./sapi/cli/php reduce_bench.php ran
1.06 ± 0.02 times faster than ./sapi/cli/php_old reduce_bench.php
```
On an i7-1185G7:
```
Benchmark 1: ./sapi/cli/php test.php
Time (mean ± σ): 189.6 ms ± 3.5 ms [User: 178.5 ms, System: 10.7 ms]
Range (min … max): 187.3 ms … 201.6 ms 15 runs
Benchmark 2: ./sapi/cli/php_old test.php
Time (mean ± σ): 204.2 ms ± 2.9 ms [User: 190.1 ms, System: 13.6 ms]
Range (min … max): 200.6 ms … 210.2 ms 14 runs
Summary
./sapi/cli/php test.php ran
1.08 ± 0.02 times faster than ./sapi/cli/php_old test.php
```
The refcounting and destruction is not necessary because zend_call_function
will make a copy anyway. And zend_call_function only returns FAILURE if
EG(active) is false in which case array_map shouldn't have been called
in the first place.
This avoids destruction logic for the common case, avoids some copy, and
adds an optimization hint.
For this script:
```php
$array = range(1, 10000);
$result = 0;
for ($i = 0; $i < 5000; $i++) {
$result += array_find($array, static function ($item) {
return $item === 5000;
});
}
var_dump($result);
```
On an intel i7 1185G7:
```
Benchmark 1: ./sapi/cli/php x.php
Time (mean ± σ): 543.7 ms ± 3.8 ms [User: 538.9 ms, System: 4.4 ms]
Range (min … max): 538.4 ms … 552.9 ms 10 runs
Benchmark 2: ./sapi/cli/php_old x.php
Time (mean ± σ): 583.0 ms ± 4.2 ms [User: 578.4 ms, System: 3.4 ms]
Range (min … max): 579.3 ms … 593.9 ms 10 runs
Summary
./sapi/cli/php x.php ran
1.07 ± 0.01 times faster than ./sapi/cli/php_old x.php
```
On an intel i7 4790:
```
Benchmark 1: ./sapi/cli/php x.php
Time (mean ± σ): 828.6 ms ± 4.8 ms [User: 824.4 ms, System: 1.6 ms]
Range (min … max): 822.8 ms … 839.0 ms 10 runs
Benchmark 2: ./sapi/cli/php_old x.php
Time (mean ± σ): 940.1 ms ± 26.4 ms [User: 934.4 ms, System: 2.5 ms]
Range (min … max): 918.0 ms … 981.1 ms 10 runs
Summary
./sapi/cli/php x.php ran
1.13 ± 0.03 times faster than ./sapi/cli/php_old x.php
```
By returning something more semantically meaningful that SUCCESS/FAILURE
we can avoid refcounting for array_all() and array_any().
Also we can avoid resetting the input values to UNDEF.
This syncs the implementation with the updated implementation of `array_find()`
in php/php-src#17536. For the following test script:
<?php
$array = range(1, 8000);
$result = 0;
for ($i = 0; $i < 4000; $i++) {
$result += count(array_filter($array, static function ($item) {
return $item <= 4000;
}));
}
var_dump($result);
This change results in:
Benchmark 1: /tmp/before array_filter.php
Time (mean ± σ): 696.9 ms ± 16.3 ms [User: 692.9 ms, System: 3.5 ms]
Range (min … max): 681.6 ms … 731.5 ms 10 runs
Benchmark 2: /tmp/after array_filter.php
Time (mean ± σ): 637.5 ms ± 5.6 ms [User: 633.6 ms, System: 3.8 ms]
Range (min … max): 630.2 ms … 648.6 ms 10 runs
Summary
/tmp/after array_filter.php ran
1.09 ± 0.03 times faster than /tmp/before array_filter.php
Or as reported by perf:
# Samples: 2K of event 'cpu_core/cycles/'
# Event count (approx.): 2567197440
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... ........................................................
#
37.02% before before [.] zend_call_function
15.50% before before [.] execute_ex
10.60% before before [.] zif_array_filter
9.43% before before [.] zend_hash_index_add_new
9.13% before before [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER
8.46% before before [.] zend_init_func_execute_data
3.78% before before [.] zval_add_ref
3.47% before before [.] zval_ptr_dtor
1.17% before before [.] zend_is_true
vs
# Samples: 2K of event 'cpu_core/cycles/'
# Event count (approx.): 2390202140
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... ........................................................
#
36.87% after after [.] zend_call_function
20.46% after after [.] execute_ex
8.22% after after [.] zend_init_func_execute_data
7.94% after after [.] zend_hash_index_add_new
7.89% after after [.] zif_array_filter
6.28% after after [.] ZEND_IS_SMALLER_OR_EQUAL_SPEC_TMPVARCV_CONST_HANDLER
3.95% after after [.] zval_add_ref
2.23% after after [.] zend_is_true
* array_find: Fix data type for `retval_true`
* array_find: Remove unnecessary refcounting
In a post on LinkedIn [1], Bohuslav Šimek reported that the native
implementation of `array_find()` was about 3× slower than the equivalent
userland implementation. While I was not able to verify this claim, due to a
lack of reproducer provided, I could confirm that the native `array_find()` was
indeed slower than the equivalent userland implementation. For the following
example script:
<?php
function my_array_find(array $array, callable $callback): mixed {
foreach ($array as $key => $value) {
if ($callback($value, $key)) {
return $value;
}
}
return null;
}
$array = range(1, 10000);
$result = 0;
for ($i = 0; $i < 5000; $i++) {
$result += array_find($array, static function ($item) {
return $item === 5000;
});
}
var_dump($result);
with the `array_find()` call appropriately replaced for each case, a PHP-8.4
release build provided the following results:
Benchmark 1: /tmp/before native.php
Time (mean ± σ): 765.9 ms ± 7.9 ms [User: 761.1 ms, System: 4.4 ms]
Range (min … max): 753.2 ms … 774.7 ms 10 runs
Benchmark 2: /tmp/before userland.php
Time (mean ± σ): 588.0 ms ± 17.9 ms [User: 583.6 ms, System: 4.1 ms]
Range (min … max): 576.0 ms … 633.3 ms 10 runs
Summary
/tmp/before userland.php ran
1.30 ± 0.04 times faster than /tmp/before native.php
Running `native.php` with perf reports that a third of the time is spent in
`zend_call_function()` and another 20% in `execute_ex()`, however
`php_array_find()` comes next at 14%:
# Samples: 3K of event 'cpu_core/cycles/'
# Event count (approx.): 2895247444
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ...........................................
#
32.47% before before [.] zend_call_function
20.63% before before [.] execute_ex
14.06% before before [.] php_array_find
7.89% before before [.] ZEND_IS_IDENTICAL_SPEC_CV_CONST_HANDLER
7.31% before before [.] zend_init_func_execute_data
6.50% before before [.] zend_copy_extra_args
which was surprising, because the function doesn’t too all that much. Looking
at the implementation, the refcounting stood out and it turns out that it is
not actually necessary. The `array` is passed by value to `array_find()` and
thus cannot magically change within the callback. This also means that the
array will continue to hold a reference to string keys and values, preventing
these values from being collected. The refcounting inside of `php_array_find()`
thus will never do anything useful.
Comparing the updated implementation against the original implementation shows
that this change results in a 1.14× improvement:
Benchmark 1: /tmp/before native.php
Time (mean ± σ): 775.4 ms ± 29.6 ms [User: 771.6 ms, System: 3.5 ms]
Range (min … max): 740.2 ms … 844.4 ms 10 runs
Benchmark 2: /tmp/after native.php
Time (mean ± σ): 677.3 ms ± 16.7 ms [User: 673.9 ms, System: 3.1 ms]
Range (min … max): 655.9 ms … 705.0 ms 10 runs
Summary
/tmp/after native.php ran
1.14 ± 0.05 times faster than /tmp/before native.php
Comparing the native implementation against the userland implementation with
the new implementation shows that while the native implementation still is
slower, the difference reduced to 15% (down from 30%):
Benchmark 1: /tmp/after native.php
Time (mean ± σ): 670.4 ms ± 9.3 ms [User: 666.7 ms, System: 3.4 ms]
Range (min … max): 657.1 ms … 689.0 ms 10 runs
Benchmark 2: /tmp/after userland.php
Time (mean ± σ): 576.7 ms ± 7.6 ms [User: 572.5 ms, System: 3.7 ms]
Range (min … max): 563.9 ms … 588.1 ms 10 runs
Summary
/tmp/after userland.php ran
1.16 ± 0.02 times faster than /tmp/after native.php
Looking at the updated perf results shows that `php_array_find()` now only
takes up 8% of the time:
# Samples: 2K of event 'cpu_core/cycles/'
# Event count (approx.): 2540947159
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... ...........................................
#
34.77% after after [.] zend_call_function
18.57% after after [.] execute_ex
12.28% after after [.] zend_copy_extra_args
10.91% after after [.] zend_init_func_execute_data
8.77% after after [.] php_array_find
6.70% after after [.] ZEND_IS_IDENTICAL_SPEC_CV_CONST_HANDLER
4.68% after after [.] zend_is_identical
[1] https://www.linkedin.com/posts/bohuslav-%C5%A1imek-kambo_the-surprising-performance-of-php-84-activity-7287044532280414209-6WnA
* array_find: Clean up exception handling
This change has no effect on performance, but greatly improves readability of
the implementation.
Improve range array overflow error message
Added info about "how much it exceeded" and the maximum allowable array size.
Makes debugging easier when encountering this specific issue.
We have an RC1 violation because we're immediately dereferencing and
copying the resulting array in the test case. Instead, transfer the
lifetime using RETVAL_COPY_VALUE and unwrap only after the internal
iterator is reset.
Closes GH-16970.
column_long and index_long might not be set, but are still used as arguments.
They are not actually used if column_str is set, but it's better to initialize
them anyway, if only to make MemorySanitizer happy.
Previously this returned `int`. Many functions actually take advantage
of the fact this returns exactly 0 or 1. For instance,
`main/streams/xp_socket.c` does:
sockopts |= STREAM_SOCKOP_IPV6_V6ONLY_ENABLED * zend_is_true(tmpzval);
And `Zend/zend_compile.c` does:
child = &ast->child[2 - zend_is_true(zend_ast_get_zval(ast->child[0]))];
I changed a few places trivially from `int` to `bool`, but there are
still many places such as the object handlers which return `int` that
should eventually be `bool`.
These were once used in these files but at this point aren't and are
only causing confusion whether file depends on additional extension.
- locale.h is added in main/SAPI.c for _ENABLE_PER_THREAD_LOCALE
* random: Remove `php_random_status`
Since 162e1dce98, the `php_random_status` struct
contains just a single `void*`, resulting in needless indirection when
accessing the engine state and thus decreasing readability because of the
additional non-meaningful `->state` references / the local helper variables.
There is also a small, but measurable performance benefit:
<?php
$e = new Random\Engine\Xoshiro256StarStar(0);
$r = new Random\Randomizer($e);
for ($i = 0; $i < 15; $i++)
var_dump(strlen($r->getBytes(100000000)));
goes from roughly 3.85s down to 3.60s.
The names of the `status` variables have not yet been touched to keep the diff
small. They will be renamed to the more appropriate `state` in a follow-up
cleanup commit.
* Introduce `php_random_algo_with_state`