The fix for GH-17481 introduced a regression that can cause the read of
uninitialized padding data when going over a chunk boundary during HTML
parsing of UTF-8.
The wrong offset was computed with respect to the input buffer, the
length of the error-corrected UTF-8 code point is not necessarily the
same as the input code point length.
This was not noticed because no CI jobs run with Valgrind nor I do it
regularly, and ASAN doesn't catch uninitialized accesses.
We need to properly handle the case when we return from having too few
bytes, this needs to be handled separately because the while loop
otherwise just performs a partial byte copy.
Closes GH-17489.
There are three connected subtle issues:
1) The fast path didn't correctly handle the case where the decoder
requests more data. This caused a bogus additional replacement
sequence to be outputted when encountering an incomplete sequence at
the edges of a buffer.
2) The finishing of decoding incorrectly assumed that the fast path
cannot be in a state where the last few bytes were an incomplete
sequence, but this is not true as shown by test 08.
3) The finishing of decoding could output bytes twice because it called
into dom_process_parse_chunk() twice without clearing the decoded
data. However, calling twice is not even necessary as the entire
buffer cannot be filled up entirely.
Closes GH-16226.
This introduces a new helper php_dom_create_nullable_object() that does
the NULL check and puts NULL in return_value. Otherwise it runs
php_dom_create_object(). This deduplicates a bit of code.
The template element in HTML 5 is special in the sense that it does not
add its contents into the DOM tree, but instead keeps them in a separate
shadow DOM document fragment. Interacting with the DOM tree cannot touch
the elements in the document fragment.
Closes GH-14906.
* Include from build dir first
This fixes out of tree builds by ensuring that configure artifacts are included
from the build dir.
Before, out of tree builds would preferably include files from the src dir, as
the include path was defined as follows (ignoring includes from ext/ and sapi/) :
-I$(top_builddir)/main
-I$(top_srcdir)
-I$(top_builddir)/TSRM
-I$(top_builddir)/Zend
-I$(top_srcdir)/main
-I$(top_srcdir)/Zend
-I$(top_srcdir)/TSRM
-I$(top_builddir)/
As a result, an out of tree build would include configure artifacts such as
`main/php_config.h` from the src dir.
After this change, the include path is defined as follows:
-I$(top_builddir)/main
-I$(top_builddir)
-I$(top_srcdir)/main
-I$(top_srcdir)
-I$(top_builddir)/TSRM
-I$(top_builddir)/Zend
-I$(top_srcdir)/Zend
-I$(top_srcdir)/TSRM
* Fix extension include path for out of tree builds
* Include config.h with the brackets form
`#include "config.h"` searches in the directory containing the including-file
before any other include path. This can include the wrong config.h when building
out of tree and a config.h exists in the source tree.
Using `#include <config.h>` uses exclusively the include path, and gives
priority to the build dir.
Strict error checking is always true for classes in "new DOM".
This means that we always throw an error when calling
`php_dom_throw_error`, and therefore the false return value is not
actually possible.
Also change the stub to reflect this.
* Fix inverted NULL and add dictionary
* Avoid useless error processing if no reporting is set
* Avoid double work while processing attributes and use fast text instantiation