Fixes GHSA-3qrf-m4j2-pcrr.
To parse a document with libxml2, you first need to create a parsing context.
The parsing context contains parsing options (e.g. XML_NOENT to substitute
entities) that the application (in this case PHP) can set.
Unfortunately, libxml2 also supports providing default set options.
For example, if you call xmlSubstituteEntitiesDefault(1) then the XML_NOENT
option will be added to the parsing options every time you create a parsing
context **even if the application never requested XML_NOENT**.
Third party extensions can override these globals, in particular the
substitute entity global. This causes entity substitution to be
unexpectedly active.
Fix it by setting the parsing options to a sane known value.
For API calls that depend on global state we introduce
PHP_LIBXML_SANITIZE_GLOBALS() and PHP_LIBXML_RESTORE_GLOBALS().
For other APIs that work directly with a context we introduce
php_libxml_sanitize_parse_ctxt_options().
We must never strip embedded whitespace; we only need to skip values
when that option is set, and make sure that we keep BC regarding the
different behavior for "cdata" and "complete" elements (for the former,
the whole element is skipped; for the latter only the "value" key).
We also fix erroneous `int` types which should actually be `size_t`.
Co-authored-by: Christoph M. Becker <cmbecker69@gmx.de>
Closes GH-7493.
As of PHP 8.0.0, these functions are supposed to return int, so we
cannot return `false`. Since calling the parser recursively is a
programmer error, we throw an `Error` in this case.
Cf. <https://github.com/php/php-src/pull/7363>.
The fix for bug #73151[1] cured the symptoms, but not the root cause,
namely xmlParse() must not be called recursively. Since that bugfix
also messed up the error handling, we basically revert it (but also
simplify the return), and then prevent calling the parser recursively.
[1] <f2a8a8c068>
Co-authored-by: Nikita Popov <nikita.ppv@gmail.com>
Closes GH-7363.
As these hold on to some internal resource, there can't be two
"equal" objects with different identity. Make sure the lack of
public properties doesn't result in these being treated as always
equal.
We must not call `zend_list_delete()` in resource closer functions
exposed to userland, because decreasing the refcount there leads to
use-after-free scenarios. In this case, commit 4a42fbb worked for
typical use-cases where `xml_parser_free()` has been called exactly
once for the resource, because there is an internal zval (`->index`)
referencing the same resource which already increased the refcount by
one. However, when `xml_parser_free()` is called multiple times on the
same XML parser resource, the resource would be freed prematurely.
Instead we forcefully close the resource in `xml_parser_free()`. We
also could decrease the refcount of the resource there, but that would
require to call `xml_parser_free()` which is somewhat uncommon, and
would be particularly bad wrt. PHP 8 where that function is a NOP, and
as such doesn't have to be called. So we do no longer increase the
refcount of the resource when copying it to the internal zval, and let
the usualy refcounting semantics take care of the resource destruction.
[1] <http://git.php.net/?p=php-src.git;a=commit;h=4a42fbbbc73aad7427aef5c89974d1833636e082>
From an engine perspective, named parameters mainly add three
concepts:
* The SEND_* opcodes now accept a CONST op2, which is the
argument name. For now, it is looked up by linear scan and
runtime cached.
* This may leave UNDEF arguments on the stack. To avoid having
to deal with them in other places, a CHECK_UNDEF_ARGS opcode
is used to either replace them with defaults, or error.
* For variadic functions, EX(extra_named_params) are collected
and need to be freed based on ZEND_CALL_HAS_EXTRA_NAMED_PARAMS.
RFC: https://wiki.php.net/rfc/named_params
Closes GH-5357.
While performing resource -> object migrations, we're adding
defensive classes that are final, non-serializable and non-clonable
(unless they are, of course). This path adds a ZEND_ACC_NO_DYNAMIC_PROPERTIES
flag, that also forbids the creation of dynamic properties on these objects.
This is a subset of #3931 and targeted at internal usage only
(though may be extended to userland at some point in the future).
It's already possible to achieve this (what the removed
WeakRef/WeakMap code does), but there's some caveats: First, this
simple approach is only possible if the class has no declared
properties, otherwise it's necessary to special-case those
properties. Second, it's easy to make it overly strict, e.g. by
forbidding isset($obj->prop) as well. And finally, it requires a
lot of boilerplate code for each class.
Closes GH-5572.
The hash is used to check whether the arginfo file needs to be
regenerated. PHP-Parser will only be downloaded if this is actually
necessary.
This ensures that release artifacts will never try to regenerate
stubs and thus fetch PHP-Parser, as long as you do not modify any
files.
Closes GH-5739.
Closes GH-5353. From now on, PHP will have reflection information
about default values of parameters of internal functions.
Co-authored-by: Nikita Popov <nikita.ppv@gmail.com>