Commit graph

359 commits

Author SHA1 Message Date
Niels Dossche
ec79fc9d9c Merge branch 'PHP-8.3'
* PHP-8.3:
  Fix GH-12870: Creating an xmlns attribute results in a DOMException
2023-12-07 22:51:02 +01:00
Niels Dossche
e658f80501 Fix GH-12870: Creating an xmlns attribute results in a DOMException
There were multiple things here since forever, see the GH thread [1]
for discussion.

There were already many fixes to this function previously, and as a
consequence of one of those fixes this started throwing exceptions for a
correct use-case. It turns out that even when reverting to the previous
behaviour there are still bugs. Just fix all of them while we have the
chance.

[1] https://github.com/php/php-src/issues/12870

Closes GH-12888.
2023-12-07 22:42:32 +01:00
Niels Dossche
9c306470fb
Handle libxml2 OOM more consistently (#11927)
This is a continuation of commit c2a58ab07d, in which several OOM error
handling was converted to throwing an INVALID_STATE_ERR DOMException.
Some places were missed and they still returned false without an
exception, or threw a PHP_ERR DOMException.
Convert all of these to INVALID_STATE_ERR DOMExceptions. This also
reduces confusion of users going through documentation [1].

Unfortunately, not all node creations are checked for a NULL pointer.
Some places therefore will not do anything if an OOM occurs (well,
except crash).
On the one hand it's nice to handle these OOM cases.
On the other hand, this adds some complexity and it's very unlikely to
happen in the real world. But then again, "unlikely" situations have
caused trouble before. Ideally all cases should be checked.

[1] https://github.com/php/doc-en/issues/1741
2023-12-04 23:49:25 +01:00
Niels Dossche
ae83d6ab07
Fix issues related to libxml2 2.12.0 (#12802)
* Avoid passing NULL to xmlSwitchToEncoding

This otherwise switches to UTF-8 on libxml2 2.12.0

* Split tests for different error reporting behaviour in libxml2 2.12.0

* Avoid deprecation warnings for libxml2 2.12.0

We can't fully get rid of the parser globals as there are still APIs
that implicitly use them.

* Temporarily disable part of test for libxml 2.12.0 regression

See https://gitlab.gnome.org/GNOME/libxml2/-/issues/634

* Review fixes

* [ci skip] Update test description
2023-11-29 20:46:35 +01:00
Niels Dossche
8a95e616b9 Fix GH-12702: libxml2 2.12.0 issue building from src
Fixes GH-12702.

Co-authored-by: nono303 <github@nono303.net>
2023-11-17 19:46:30 +01:00
Niels Dossche
1492be5286
[RFC] DOM HTML5 parsing and serialization support (#12111) 2023-11-13 20:18:19 +01:00
Niels Dossche
f9a9008fea Merge branch 'PHP-8.3'
* PHP-8.3:
  [ci skip] NEWS
  Fix null pointer dereferences in case of allocation failure
2023-10-24 19:42:59 +02:00
Niels Dossche
a64b48ba92 Merge branch 'PHP-8.2' into PHP-8.3
* PHP-8.2:
  [ci skip] NEWS
  Fix null pointer dereferences in case of allocation failure
2023-10-24 19:42:43 +02:00
icy17
900f0cab9f Fix null pointer dereferences in case of allocation failure
Closes GH-12506.
2023-10-24 19:34:47 +02:00
Niels Dossche
3ff3199dc9 Merge branch 'PHP-8.3'
* PHP-8.3:
  Fix registerNodeClass with abstract class crashing
2023-10-13 19:11:10 +02:00
Niels Dossche
f5d1a194d9 Merge branch 'PHP-8.2' into PHP-8.3
* PHP-8.2:
  Fix registerNodeClass with abstract class crashing
2023-10-13 19:10:51 +02:00
Niels Dossche
d7de0ceca6 Fix registerNodeClass with abstract class crashing
This always results in a segfault when trying to instantiate, so this never
worked. At least throw an error instead of segfaulting to prevent developers
from being confused.

Closes GH-12420.
2023-10-13 19:06:09 +02:00
Niels Dossche
b56141c5dd Merge branch 'PHP-8.3'
* PHP-8.3:
  Fix broken cache invalidation with deallocated and reallocated document node
2023-10-01 17:07:11 +02:00
Niels Dossche
eebc528cbf Fix broken cache invalidation with deallocated and reallocated document node
The original caching implementation had an oversight in combination with
the new lifetime management in DOM for 8.3.
The modification counter is stored on the document object itself, but as
that can get deallocated when all references disappear, stale cache data
can be used. Normally this isn't a problem, unless getElementsByTagName is
called not on the document but on a child node. Fix it by moving caching
data into the ref object, which will outlive all nodes from a document
even if the document object disappears.

Closes GH-12338.
2023-10-01 17:06:02 +02:00
Niels Dossche
f2fede56c8 Deduplicate ParentNode and ChildNode interface implementations using @implementation-alias
The entry points are duplicated: they add bloat and make it easier to forget
to change something. Make maintenance easier by using @implementation-alias.
Also, this has the nice side-effect of slightly reducing the amount of
code and binary size.

Closes GH-12158.
2023-09-10 22:35:58 +02:00
Niels Dossche
48443183af
Use zend_result as return for properties in ext/dom (#12113) 2023-09-03 00:42:49 +02:00
Niels Dossche
807a05ee55 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix memory leak when setting an invalid DOMDocument encoding
2023-08-20 14:07:44 +02:00
Niels Dossche
20ac42e1b0 Fix memory leak when setting an invalid DOMDocument encoding
Because the failure path did not release the string, there was a memory
leak.
As the only valid types for this function are IS_NULL and IS_STRING, we
and IS_NULL is always rejected in practice, solve the issue by not using
a function that increments the refcount in the first place.

Closes GH-12002.
2023-08-20 14:05:26 +02:00
Niels Dossche
d46dc5694c Fix various namespace prefix conflict resolution bugs and namespace shift bugs
There are two linked issues:

- Conflicts couldn't be resolved by changing the prefix name.
- Lacking a prefix would shift the namespace as the default namespace,
  causing elements to suddenly become part of the namespace instead of
  the attributes.

The output could still be improved by removing redundant namespace
declarations, but that's another issue. At least the output is
correct now.

Closes GH-11777.
2023-08-15 20:42:42 +02:00
Niels Dossche
3ad5029442 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-11830: ParentNode methods should perform their checks upfront
  Fix manually calling __construct() on DOM classes
2023-08-07 19:52:04 +02:00
Niels Dossche
08c4db7f36 Fix manually calling __construct() on DOM classes
Closes GH-11894.
2023-08-07 19:37:47 +02:00
Niels Dossche
872bf56fed Remove useless check
This is a remnant from the time the parser could be invoked statically.
2023-08-06 17:15:43 +02:00
Niels Dossche
6f6fedcb46 Handle strict error properly in adoptNode failure, and add a test 2023-08-02 20:40:30 +02:00
Niels Dossche
04df77650d Deduplicate loading code 2023-08-02 20:24:30 +02:00
Niels Dossche
fa397e0217 Respect strict error setting for adoptNode 2023-08-02 20:24:24 +02:00
Derick Rethans
86afbe10e2 Merge branch 'PHP-8.2' 2023-07-31 19:57:02 +01:00
Derick Rethans
0870ebb862 Merge branch 'PHP-8.0' into PHP-8.1 2023-07-31 19:53:43 +01:00
Niels Dossche
c283c3ab0b Sanitize libxml2 globals before parsing
Fixes GHSA-3qrf-m4j2-pcrr.

To parse a document with libxml2, you first need to create a parsing context.
The parsing context contains parsing options (e.g. XML_NOENT to substitute
entities) that the application (in this case PHP) can set.
Unfortunately, libxml2 also supports providing default set options.
For example, if you call xmlSubstituteEntitiesDefault(1) then the XML_NOENT
option will be added to the parsing options every time you create a parsing
context **even if the application never requested XML_NOENT**.

Third party extensions can override these globals, in particular the
substitute entity global. This causes entity substitution to be
unexpectedly active.

Fix it by setting the parsing options to a sane known value.
For API calls that depend on global state we introduce
PHP_LIBXML_SANITIZE_GLOBALS() and PHP_LIBXML_RESTORE_GLOBALS().
For other APIs that work directly with a context we introduce
php_libxml_sanitize_parse_ctxt_options().
2023-07-31 19:47:19 +01:00
Niels Dossche
ae66a0d142 Corrections to return type of loading DOM documents 2023-07-29 14:58:01 +02:00
Niels Dossche
4bee5743e9 Fix GH-11792: LIBXML_NOXMLDECL is not implemented or broken
Fixes GH-11792.
Closes GH-11794.
2023-07-26 18:02:36 +02:00
Niels Dossche
8ac0dcc632 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-11791: Wrong default value of DOMDocument::xmlStandalone
2023-07-26 17:26:14 +02:00
Niels Dossche
bf4e7bd3ed Fix GH-11791: Wrong default value of DOMDocument::xmlStandalone
At one point this was changed from a bool to an int in libxml2, with
negative values meaning it is unspecified. Because it is cast to a bool
this therefore returned true instead of the expected false.

Closes GH-11793.
2023-07-26 17:20:10 +02:00
Niels Dossche
5da08dfff1 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix empty argument cases for DOMParentNode methods
  Fix DOMCharacterData::replaceWith() with itself
  Fix incorrect attribute existence check in DOMElement::setAttributeNodeNS()
  Fix DOMEntity field getter bugs
2023-07-24 19:05:16 +02:00
Niels Dossche
abb1d2e824 Fix empty argument cases for DOMParentNode methods
Closes GH-11768.
2023-07-24 18:58:39 +02:00
Niels Dossche
0f20527768
Simplify configuration getters (#11778)
dom_get_doc_props_read_only() already does a NULL check.
2023-07-24 17:17:12 +02:00
Niels Dossche
a73f38f407
Implement DOMElement::insertAdjacent{Element,Text} (#11700)
* Implement DOMElement::insertAdjacent{Element,Text}

ref: https://dom.spec.whatwg.org/#dom-element-insertadjacentelement
ref: https://dom.spec.whatwg.org/#dom-element-insertadjacenttext

Closes GH-11700.
2023-07-17 17:42:47 +02:00
Niels Dossche
d38cc9b9b6 Implement DOMNode::isConnected and DOMNameSpaceNode::isConnected
ref: https://dom.spec.whatwg.org/#dom-node-isconnected

Closes GH-11677.
2023-07-17 13:14:13 +02:00
Niels Dossche
e8f0bdc7f1 Fix ? 2023-07-14 14:34:29 +02:00
Niels Dossche
6560c9bf8e Implement DOMParentNode::replaceChildren()
ref: https://dom.spec.whatwg.org/#dom-parentnode-replacechildren
2023-07-14 14:34:29 +02:00
Niels Dossche
87e7b61d8f Cleanup macro usage in ext/dom and ext/libxml
With the new lifetime system in place, we can simplify some macros.
2023-07-03 21:32:48 +02:00
Niels Dossche
ed6df1f0ad Implement DOMDocument::adoptNode()
For the past 20 years this threw a "not yet implemented" exception. But
the function was actually there (albeit not documented) and could be called...

Closes GH-11333.
2023-06-27 17:59:58 +02:00
nielsdos
12e4628815 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-11455: Segmentation fault with custom object date properties
  Revert "Fix GH-11404: DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM"
2023-06-19 19:45:24 +02:00
nielsdos
c174ebfce0 Revert "Fix GH-11404: DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM"
This reverts commit 7eb3e9cd17.

Although the fix follows the spec, it causes issues because a lot of old
code assumes the incorrect behaviour PHP had since a long time.
We cannot do this yet, especially not in a stable release.
We revert this for the time being.
See GH-11428.
2023-06-19 19:37:46 +02:00
Niels Dossche
47708765d6 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix GH-11404: DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM
2023-06-17 13:42:10 +02:00
nielsdos
7eb3e9cd17 Fix GH-11404: DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM
The NULL namespace is only correct when there is no default namespace
override. When there is, we need to manually set it to the empty string
namespace.

Closes GH-11428.
2023-06-17 13:36:00 +02:00
nielsdos
6e04050474 Remove redundant assignment on nodep->ns
It's already set by xmlSetNs().
2023-06-07 18:49:22 +02:00
Niels Dossche
a720268214
Use uint32_t for the number of nodes (#11371) 2023-06-05 13:57:41 +01:00
Niels Dossche
e8fb0edc69 Merge branch 'PHP-8.2'
* PHP-8.2:
  Fix bug #77686: Removed elements are still returned by getElementById
  Fix bug #81642: DOMChildNode::replaceWith() bug when replacing a node with itself
  Fix bug #67440: append_node of a DOMDocumentFragment does not reconcile namespaces
2023-06-04 16:34:42 +02:00
Niels Dossche
0e34ac864a Fix bug #77686: Removed elements are still returned by getElementById
From the moment an ID is created, libxml2's behaviour is to cache that element,
even if that element is not yet attached to the document. Similarly, only upon
destruction of the element the ID is actually removed by libxml2.
Since libxml2 has such behaviour deeply ingrained in the library, and uses the
cache for various purposes, it seems like a bad idea and lost cause to fight it.
Instead, we'll simply walk the tree upwards to check if the node is attached to
the document.

Closes GH-11369.
2023-06-04 16:20:34 +02:00
Niels Dossche
c3f0797385
Implement iteration cache, item cache and length cache for node list iteration (#11330)
* Implement iteration cache, item cache and length cache for node list iteration

The current implementation follows the spec requirement that the list
must be "live". This means that changes in the document must be
reflected in the existing node lists without requiring the user to
refetch the node list.
The consequence is that getting any item, or the length of the list,
always starts searching from the root element of the node list. This
results in O(n) time to get any item or the length. If there's a for
loop over the node list, this means the iterations will take O(n²) time
in total. This causes real-world performance issues with potential for
downtime (see GH-11308 and its references for details).

We fix this by introducing a caching strategy. We cache the last
iterated object in the iterator, the last requested item in the node
list, and the last length computation. To invalidate the cache, we
simply count the number of modifications made to the containing
document. If the modification number does not match what the number was
during caching, we know the document has been modified and the cache is
invalid. If this ever overflows, we saturate the modification number and
don't do any caching anymore. Note that we don't check for overflow on
64-bit systems because it would take hundreds of years to overflow.

Fixes GH-11308.
2023-06-03 00:13:14 +02:00