Move SimpleXML invalidation code after node checks
This is safe, i.e. the tree hasn't been modified yet, because either we
didn't call a libxml modification function yet, or xmlNewChild is called
with a NULL pointer, which makes it bail out and return NULL.
Closes GH-12947.
* PHP-8.2:
Fix getting the address of an uninitialized property of a SimpleXMLElement resulting in a crash
Fix GH-12962: Double free of init_file in phpdbg_prompt.c
Many methods in SimpleXML reset the iterator when called. This has the
consequence that mixing these operations with loops can cause infinite
loops, or the loss of iteration data.
Some people may however rely on the resetting behaviour. To prevent
unintended breaks in stable branches, let's only apply the fix to master.
This reverts GH-12193, GH-12229, GG-12247 for stable branches while
keeping them on master, adding a note in UPGRADING as well.
* PHP-8.2:
Fix GH-12223: Entity reference produces infinite loop in var_dump/print_r
Fix GH-12192: SimpleXML infinite loop when getName() is called within foreach
Fix GH-12186: segfault copying/cloning a finalized HashContext
* PHP-8.1:
Fix GH-12223: Entity reference produces infinite loop in var_dump/print_r
Fix GH-12192: SimpleXML infinite loop when getName() is called within foreach
Fix GH-12186: segfault copying/cloning a finalized HashContext
This happens because getName() resets the iterator to the start because
it overwrites the iterator data.
We add a version of get_first_node that does not overwrite the iterator
data.
Closes GH-12193.
Fixes GHSA-3qrf-m4j2-pcrr.
To parse a document with libxml2, you first need to create a parsing context.
The parsing context contains parsing options (e.g. XML_NOENT to substitute
entities) that the application (in this case PHP) can set.
Unfortunately, libxml2 also supports providing default set options.
For example, if you call xmlSubstituteEntitiesDefault(1) then the XML_NOENT
option will be added to the parsing options every time you create a parsing
context **even if the application never requested XML_NOENT**.
Third party extensions can override these globals, in particular the
substitute entity global. This causes entity substitution to be
unexpectedly active.
Fix it by setting the parsing options to a sane known value.
For API calls that depend on global state we introduce
PHP_LIBXML_SANITIZE_GLOBALS() and PHP_LIBXML_RESTORE_GLOBALS().
For other APIs that work directly with a context we introduce
php_libxml_sanitize_parse_ctxt_options().
* Implement iteration cache, item cache and length cache for node list iteration
The current implementation follows the spec requirement that the list
must be "live". This means that changes in the document must be
reflected in the existing node lists without requiring the user to
refetch the node list.
The consequence is that getting any item, or the length of the list,
always starts searching from the root element of the node list. This
results in O(n) time to get any item or the length. If there's a for
loop over the node list, this means the iterations will take O(n²) time
in total. This causes real-world performance issues with potential for
downtime (see GH-11308 and its references for details).
We fix this by introducing a caching strategy. We cache the last
iterated object in the iterator, the last requested item in the node
list, and the last length computation. To invalidate the cache, we
simply count the number of modifications made to the containing
document. If the modification number does not match what the number was
during caching, we know the document has been modified and the cache is
invalid. If this ever overflows, we saturate the modification number and
don't do any caching anymore. Note that we don't check for overflow on
64-bit systems because it would take hundreds of years to overflow.
Fixes GH-11308.
This reverts commit 94ee4f9834.
The commit was a bit too late to be included in PHP 8.2 RC1. Given it's a massive ABI break, we decide to postpone the change to PHP 8.3.