On x86-64 with GCC 14.2.1:
zim_Random_Randomizer_getBytes goes from 514 to 418 bytes
zim_Random_Randomizer_getBytesFromString goes from 750 to 676 bytes
PHP Internals Book says:
> The ZVAL_DUP macro is similar to ZVAL_COPY, but will duplicate arrays, rather
> than just incrementing their refcount. If you are using this macro, you are
> almost certainly doing something very wrong.
Replace this by an explicit call to `zend_array_dup()`, as done in
`php_array_diff()`. Besides being more explicit in what is happening, this
likely also results in better assembly.
This patch greatly improves the performance for the common case of using a
64-bit engine and requesting a length that is a multiple of 8.
It does so by providing a fast path that will just `memcpy()` (which will be
optimized out) the returned uint64_t directly into the output buffer,
byteswapping it for big endian architectures.
The existing byte-wise copying logic was mostly left alone. It only received an
optimization of the shifting and masking that was previously applied to
`Randomizer::getBytesFromString()` in 1fc2ddc996.
Co-authored-by: Saki Takamachi <saki@php.net>
* random: Remove `php_random_status`
Since 162e1dce98, the `php_random_status` struct
contains just a single `void*`, resulting in needless indirection when
accessing the engine state and thus decreasing readability because of the
additional non-meaningful `->state` references / the local helper variables.
There is also a small, but measurable performance benefit:
<?php
$e = new Random\Engine\Xoshiro256StarStar(0);
$r = new Random\Randomizer($e);
for ($i = 0; $i < 15; $i++)
var_dump(strlen($r->getBytes(100000000)));
goes from roughly 3.85s down to 3.60s.
The names of the `status` variables have not yet been touched to keep the diff
small. They will be renamed to the more appropriate `state` in a follow-up
cleanup commit.
* Introduce `php_random_algo_with_state`
Instead of returning the generated `uint64_t` and providing the size (i.e. the
number of bytes of the generated value) out-of-band via the
`last_generated_size` member of the `php_random_status` struct, the `generate`
function is now expected to return a new `php_random_result` struct containing
both the `size` and the `result`.
This has two benefits, one for the developer:
It's no longer possible to forget setting `last_generated_size` to the correct
value, because it now happens at the time of returning from the function.
and the other benefit is for performance:
The `php_random_result` struct will be returned as a register pair, thus the
`size` will be directly available without reloading it from main memory.
Checking a simplified version of `php_random_range64()` on Compiler Explorer
(“Godbolt”) with clang 17 shows a single change in the resulting assembly
showcasing the improvement (https://godbolt.org/z/G4WjdYxqx):
- add rbp, qword ptr [r14]
+ add rbp, rdx
Empirical testing confirms a measurable performance increase for the
`Randomizer::getBytes()` method:
<?php
$e = new Random\Engine\Xoshiro256StarStar(0);
$r = new Random\Randomizer($e);
var_dump(strlen($r->getBytes(100000000)));
goes from 250ms (before the change) to 220ms (after the change). While
generating 100 MB of random data certainly is not the most common use case, it
confirms the theoretical improvement in practice.
* random: Add `max_offset` local to Randomizer::getBytesFromString()
* random: Use branchless implementation for mask generation in Randomizer::getBytesFromString()
This was benchmarked against clzl with a standalone script with random inputs
and is slightly faster. clzl requires an additional branch to handle the
source_length = 1 / max_offset = 0 case.
* Improve comment for masking in Randomizer::getBytesFromString()
With a single byte we can choose offsets between 0x00 and 0xff, thus 0x100
different offsets. We only need to use the slow path for sources of more than
0x100 bytes.
The previous version was correct with regard to the output expectations, it was
just slower than necessary. Better fix this now while we still can before being
bound by our BC guarantees with regard to emitted sequences.
This also adds a test to verify the behavior: For powers of two we never reject
any values during rejection sampling, we just need to mask off the unneeded
bits. Thus we can specifically verify that the number of calls to the engine
match the expected amount. We also verify that all the possible values are
emitted to make sure the masking does not remove any required bits. For inputs
longer than 0x100 bytes we need trust the `range()` implementation to be
unbiased, but still verify the number of engine calls and perform a basic
output check.
* random: Randomizer::getFloat(): Fix check for empty open intervals
The check for invalid parameters for the IntervalBoundary::OpenOpen variant was
not correct: If two consecutive doubles are passed as parameters, the resulting
interval is empty, resulting in an uint64 underflow in the γ-section
implementation.
Instead of checking whether `$min < $max`, we must check that there is at least
one more double between `$min` and `$max`, i.e. it must hold that:
nextafter($min, $max) != $max
Instead of duplicating the comparatively complicated and expensive `nextafter`
logic for a rare error case we instead return `NAN` from the γ-section
implementation when the parameters result in an empty interval and thus underflow.
This allows us to reliably detect this specific error case *after* the fact,
but without modifying the engine state. It also provides reliable error
reporting for other internal functions that might use the γ-section
implementation.
* random: γ-section: Also check that that min is smaller than max
This extends the empty-interval check in the γ-section implementation with a
check that min is actually the smaller of the two parameters.
* random: Use PHP_FLOAT_EPSILON in getFloat_error.phpt
Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de>
* random: Add Randomizer::nextFloat()
* random: Check that doubles are IEEE-754 in Randomizer::nextFloat()
* random: Add Randomizer::nextFloat() tests
* random: Add Randomizer::getFloat() implementing the y-section algorithm
The algorithm is published in:
Drawing Random Floating-Point Numbers from an Interval. Frédéric
Goualard, ACM Trans. Model. Comput. Simul., 32:3, 2022.
https://doi.org/10.1145/3503512
* random: Implement getFloat_gamma() optimization
see https://github.com/php/php-src/pull/9679/files#r994668327
* random: Add Random\IntervalBoundary
* random: Split the implementation of γ-section into its own file
* random: Add tests for Randomizer::getFloat()
* random: Fix γ-section for 32-bit systems
* random: Replace check for __STDC_IEC_559__ by compile-time check for DBL_MANT_DIG
* random: Drop nextFloat_spacing.phpt
* random: Optimize Randomizer::getFloat() implementation
* random: Reject non-finite parameters in Randomizer::getFloat()
* random: Add NEWS/UPGRADING for Randomizer’s float functionality
* Fix pre-PHP 8.2 compatibility for php_mt_rand_range() with MT_RAND_PHP
As some left-over comments indicated:
> Legacy mode deliberately not inside php_mt_rand_range()
> to prevent other functions being affected
The broken scaler was only used for `php_mt_rand_common()`, not
`php_mt_rand_range()`. The former is only used for `mt_rand()`, whereas the
latter is used for `array_rand()` and others.
With the refactoring for the introduction of ext/random `php_mt_rand_common()`
and `php_mt_rand_range()` were accidentally unified, thus introducing a
behavioral change that was reported in FakerPHP/Faker#528.
This commit moves the checks for `MT_RAND_PHP` from the general-purpose
`range()` function back into `php_mt_rand_common()` and also into
`Randomizer::getInt()` for drop-in compatibility with `mt_rand()`.
* [ci skip] NEWS for `MT_RAND_PHP` compatibility
* Remove superfluous helper variable in Randomizer::getBytes()
* Reduce the scope of `result` in Randomizer::getBytes()
Co-authored-by: Tim Düsterhus <tim@bastelstu.be>
* Apply `var_dump()` in 02_engine/all_serialize_error.phpt
This ensures that an undetected serialization error is clear identifiable in the output.
* random: Validate that the arrays do not contain extra elements when unserializing
* Emit deprecation warnings when adding dynamic properties to classes during unserialization - this will become an Error in php 9.0.
(Adding dynamic properties in other contexts was already a deprecation warning - the use case of unserialization was overlooked)
* Throw an error when attempting to add a dynamic property to a `readonly` class when unserializing
* Add new serialization methods `__serialize`/`__unserialize` for SplFixedArray to avoid creating deprecated dynamic
properties that would then be added to the backing fixed-size array
* Don't add named dynamic/declared properties (e.g. $obj->foo) of SplFixedArray to the backing array when unserializing
* Update tests to declare properties or to expect the deprecation warning
* Add news entry
Co-authored-by: Tyson Andre <tysonandre775@hotmail.com>
* Verify that the engine doesn't change in construct_twice.phpt
* Clean up the implementation of Randomizer::__construct()
Instead of manually checking whether the constructor was already called, we
rely on the `readonly` modifier of the `$engine` property.
Additionally use `object_init_ex()` instead of manually calling
`->create_object()`.
* Remove exception in Randomizer::shuffleBytes()
The only way that `php_binary_string_shuffle` fails is when the engine itself
fails. With the currently available list of engines we have:
- Mt19937 : Infallible.
- PcgOneseq128XslRr64: Infallible.
- Xoshiro256StarStar : Infallible.
- Secure : Practically infallible on modern systems.
Exception messages were cleaned up in GH-9169.
- User : Error when returning an empty string.
Error when seriously biased (range() fails).
And whatever Throwable the userland developer decides to use.
So the existing engines are either infallible or throw an Exception/Error with
a high quality message themselves, making this exception not a value-add and
possibly confusing.
* Remove exception in Randomizer::shuffleArray()
Same reasoning as in the previous commit applies.
* Remove exception in Randomizer::getInt()
Same reasoning as in the previous commit applies.
* Remove exception in Randomizer::nextInt()
Same reasoning as in the previous commit applies, except that it won't throw on
a seriously biased user engine, as `range()` is not used.
* Remove exception in Randomizer::getBytes()
Same reasoning as in the previous commit applies.
* Remove exception in Mt19937::generate()
This implementation is shared across all native engines. Thus the same
reasoning as the previous commits applies, except that the User engine does not
use this method. Thus is only applicable to the Secure engine, which is the
only fallible native engine.
* [ci skip] Add cleanup of Randomizer exceptions to NEWS
* Unify ext/random unserialize errors with ext/date
- Use `Error` instead of `Exception`.
- Adjust wording.
* Make `zend_read_property` silent in `Randomizer::__unserialize()`
Having:
> Error: Typed property Random\Randomizer::$engine must not be accessed before
> initialization
is not a value-add in this case.
* Insert the actual class name in the unserialization error of Engines
* Revert unserialization failure back to Exception from Error
see https://news-web.php.net/php.internals/118311
Since argument overloading is not safe for reflection, the method needed
to be split appropriately.
Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>
Closes GH-9057.
Whenever ->last_unsafe is set to `true` an exception has been thrown. Thus we
can replace the check for `->last_unsafe` with a check for `EG(exception)`
which is a much more natural way to ommunicate an error up the chain.
When Radomizer::__construct() was called with no arguments, Randomizer\Engine\Secure was implicitly instantiate and memory was leaking.
Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>