ParentNode::$children returns a HTMLCollection of all directly
descendant child elements of a container.
I had to move around some properties such that the ParentNode property
offsets are always at a fixed offset, to simplify the code.
This also adds the necessary code to deal with GC cycles in
HTMLCollections.
Furthermore, we also disable cloning a HTMLCollection as that never
worked and furthermore it also conflicts with the [[SameObject]] WebIDL
requirement of $children.
The fix for GH-17481 introduced a regression that can cause the read of
uninitialized padding data when going over a chunk boundary during HTML
parsing of UTF-8.
The wrong offset was computed with respect to the input buffer, the
length of the error-corrected UTF-8 code point is not necessarily the
same as the input code point length.
This was not noticed because no CI jobs run with Valgrind nor I do it
regularly, and ASAN doesn't catch uninitialized accesses.
We need to properly handle the case when we return from having too few
bytes, this needs to be handled separately because the while loop
otherwise just performs a partial byte copy.
Closes GH-17489.
* Use a direct call for decoding the UTF-8 buffer
* Add fast path for UTF-8 HTML serialization
This patch adds a fast path to the HTML serialization encoding that has
to encode to UTF-8. Because the DOM internally represents all strings
using UTF-8, we only need to validate here.
Tested on Wikipedia English home page on an i7-4790:
```
Benchmark 1: ./sapi/cli/php x.php
Time (mean ± σ): 516.0 ms ± 6.4 ms [User: 511.2 ms, System: 3.5 ms]
Range (min … max): 506.0 ms … 527.1 ms 10 runs
Benchmark 2: ./sapi/cli/php_old x.php
Time (mean ± σ): 682.8 ms ± 6.5 ms [User: 676.8 ms, System: 3.8 ms]
Range (min … max): 675.8 ms … 695.6 ms 10 runs
Summary
./sapi/cli/php x.php ran
1.32 ± 0.02 times faster than ./sapi/cli/php_old x.php
```
(And if you're interested: it takes over a second on my machine using the old DOMDocument class)
Future optimizations are certainly possible, but let's start here.
GitHub FYP test case:
```
Benchmark 1: ./sapi/cli/php test.php
Time (mean ± σ): 502.8 ms ± 6.2 ms [User: 498.3 ms, System: 3.2 ms]
Range (min … max): 495.2 ms … 509.8 ms 10 runs
Benchmark 2: ./sapi/cli/php_old test.php
Time (mean ± σ): 518.4 ms ± 4.3 ms [User: 513.9 ms, System: 3.2 ms]
Range (min … max): 511.5 ms … 525.5 ms 10 runs
Summary
./sapi/cli/php test.php ran
1.03 ± 0.02 times faster than ./sapi/cli/php_old test.php
```
Wikipedia English homepage test case:
```
Benchmark 1: ./sapi/cli/php test.php
Time (mean ± σ): 301.1 ms ± 4.2 ms [User: 295.5 ms, System: 4.8 ms]
Range (min … max): 296.3 ms … 308.8 ms 10 runs
Benchmark 2: ./sapi/cli/php_old test.php
Time (mean ± σ): 308.2 ms ± 1.7 ms [User: 304.6 ms, System: 2.9 ms]
Range (min … max): 306.9 ms … 312.8 ms 10 runs
Summary
./sapi/cli/php test.php ran
1.02 ± 0.02 times faster than ./sapi/cli/php_old test.php
```
There are three connected subtle issues:
1) The fast path didn't correctly handle the case where the decoder
requests more data. This caused a bogus additional replacement
sequence to be outputted when encountering an incomplete sequence at
the edges of a buffer.
2) The finishing of decoding incorrectly assumed that the fast path
cannot be in a state where the last few bytes were an incomplete
sequence, but this is not true as shown by test 08.
3) The finishing of decoding could output bytes twice because it called
into dom_process_parse_chunk() twice without clearing the decoded
data. However, calling twice is not even necessary as the entire
buffer cannot be filled up entirely.
Closes GH-16226.
This introduces a new helper php_dom_create_nullable_object() that does
the NULL check and puts NULL in return_value. Otherwise it runs
php_dom_create_object(). This deduplicates a bit of code.
The template element in HTML 5 is special in the sense that it does not
add its contents into the DOM tree, but instead keeps them in a separate
shadow DOM document fragment. Interacting with the DOM tree cannot touch
the elements in the document fragment.
Closes GH-14906.
* Include from build dir first
This fixes out of tree builds by ensuring that configure artifacts are included
from the build dir.
Before, out of tree builds would preferably include files from the src dir, as
the include path was defined as follows (ignoring includes from ext/ and sapi/) :
-I$(top_builddir)/main
-I$(top_srcdir)
-I$(top_builddir)/TSRM
-I$(top_builddir)/Zend
-I$(top_srcdir)/main
-I$(top_srcdir)/Zend
-I$(top_srcdir)/TSRM
-I$(top_builddir)/
As a result, an out of tree build would include configure artifacts such as
`main/php_config.h` from the src dir.
After this change, the include path is defined as follows:
-I$(top_builddir)/main
-I$(top_builddir)
-I$(top_srcdir)/main
-I$(top_srcdir)
-I$(top_builddir)/TSRM
-I$(top_builddir)/Zend
-I$(top_srcdir)/Zend
-I$(top_srcdir)/TSRM
* Fix extension include path for out of tree builds
* Include config.h with the brackets form
`#include "config.h"` searches in the directory containing the including-file
before any other include path. This can include the wrong config.h when building
out of tree and a config.h exists in the source tree.
Using `#include <config.h>` uses exclusively the include path, and gives
priority to the build dir.
Strict error checking is always true for classes in "new DOM".
This means that we always throw an error when calling
`php_dom_throw_error`, and therefore the false return value is not
actually possible.
Also change the stub to reflect this.
* Fix inverted NULL and add dictionary
* Avoid useless error processing if no reporting is set
* Avoid double work while processing attributes and use fast text instantiation