Commit graph

39 commits

Author SHA1 Message Date
Gina Peter Banyard
758e1e3192
ext/dom: Fix new MSVC compiler warning
Closes GH-18889
2025-07-02 09:00:09 +09:00
Niels Dossche
81803b9b09
Fix potential read of uninitialized padding data in DOM (#17628)
The fix for GH-17481 introduced a regression that can cause the read of
uninitialized padding data when going over a chunk boundary during HTML
parsing of UTF-8.
The wrong offset was computed with respect to the input buffer, the
length of the error-corrected UTF-8 code point is not necessarily the
same as the input code point length.
This was not noticed because no CI jobs run with Valgrind nor I do it
regularly, and ASAN doesn't catch uninitialized accesses.
2025-01-30 11:26:58 -04:00
Niels Dossche
359eb30351
Fix GH-17609: Typo in error message: Dom\NO_DEFAULT_NS instead of Dom\HTML_NO_DEFAULT_NS 2025-01-28 19:30:25 +01:00
Niels Dossche
2952e164a9
Fix GH-17481: UTF-8 corruption in \Dom\HTMLDocument
We need to properly handle the case when we return from having too few
bytes, this needs to be handled separately because the while loop
otherwise just performs a partial byte copy.

Closes GH-17489.
2025-01-17 16:25:08 +01:00
Niels Dossche
21c170c75a
Fix GH-17486: Incorrect error line numbers reported in Dom\HTMLDocument::createFromString
Closes GH-17491.
2025-01-17 16:24:28 +01:00
Niels Dossche
1e949d189a
Fix edge-case in DOM parsing decoding
There are three connected subtle issues:
1) The fast path didn't correctly handle the case where the decoder
   requests more data. This caused a bogus additional replacement
   sequence to be outputted when encountering an incomplete sequence at
   the edges of a buffer.
2) The finishing of decoding incorrectly assumed that the fast path
   cannot be in a state where the last few bytes were an incomplete
   sequence, but this is not true as shown by test 08.
3) The finishing of decoding could output bytes twice because it called
   into dom_process_parse_chunk() twice without clearing the decoded
   data. However, calling twice is not even necessary as the entire
   buffer cannot be filled up entirely.

Closes GH-16226.
2024-10-05 18:27:18 +02:00
Niels Dossche
88393cfaf7
Fix GH-13988: Storing DOMElement consume 4 times more memory in PHP 8.1 than in PHP 8.0
We avoid creating backing storage by using the feature introduced in
f78d5cfcd2.

Closes GH-15593.
2024-08-27 20:14:25 +02:00
Niels Dossche
d32b97a1c7
Fix NULL pointer dereference with NULL content in legacy nodes in title getting (#15558) 2024-08-23 19:38:13 +02:00
Gina Peter Bnayard
5853cdb73d Use "must not" instead of "cannot" wording 2024-08-21 21:12:17 +01:00
Gina Peter Bnayard
6d9a74cde0 ext/dom: Use standard wording for ValueError 2024-08-21 21:12:17 +01:00
Niels Dossche
80a4783d25
Deduplicate NULL checks in ext/dom (#15015)
This introduces a new helper php_dom_create_nullable_object() that does
the NULL check and puts NULL in return_value. Otherwise it runs
php_dom_create_object(). This deduplicates a bit of code.
2024-07-18 21:20:03 +02:00
Niels Dossche
6980eba863
Support templated content
The template element in HTML 5 is special in the sense that it does not
add its contents into the DOM tree, but instead keeps them in a separate
shadow DOM document fragment. Interacting with the DOM tree cannot touch
the elements in the document fragment.

Closes GH-14906.
2024-07-15 11:10:51 +02:00
Niels Dossche
4ef7539144
Split off private data from the ns mapper 2024-07-15 11:02:52 +02:00
Niels Dossche
88da914910 Implement CSS selectors 2024-06-29 13:00:26 -07:00
Niels Dossche
48c9f1e2c3 Implement Dom\HTMLElement class 2024-06-26 12:17:12 -07:00
Niels Dossche
78401ba867 Implement Dom\Document::$title setter 2024-06-26 12:17:12 -07:00
Niels Dossche
04af960397 Implement Dom\Document::$title getter 2024-06-26 12:17:12 -07:00
Niels Dossche
a12db3b656 Implement Dom\Document::$body setter 2024-06-26 12:17:12 -07:00
Niels Dossche
287cf91724 Implement Dom\Document::$head 2024-06-26 12:17:12 -07:00
Niels Dossche
a1485df55a Implement Dom\Document::$body getter 2024-06-26 12:17:12 -07:00
Arnaud Le Blanc
11accb5cdf
Preferably include from build dir (#13516)
* Include from build dir first

This fixes out of tree builds by ensuring that configure artifacts are included
from the build dir.

Before, out of tree builds would preferably include files from the src dir, as
the include path was defined as follows (ignoring includes from ext/ and sapi/) :

    -I$(top_builddir)/main
    -I$(top_srcdir)
    -I$(top_builddir)/TSRM
    -I$(top_builddir)/Zend
    -I$(top_srcdir)/main
    -I$(top_srcdir)/Zend
    -I$(top_srcdir)/TSRM
    -I$(top_builddir)/

As a result, an out of tree build would include configure artifacts such as
`main/php_config.h` from the src dir.

After this change, the include path is defined as follows:

    -I$(top_builddir)/main
    -I$(top_builddir)
    -I$(top_srcdir)/main
    -I$(top_srcdir)
    -I$(top_builddir)/TSRM
    -I$(top_builddir)/Zend
    -I$(top_srcdir)/Zend
    -I$(top_srcdir)/TSRM

* Fix extension include path for out of tree builds

* Include config.h with the brackets form

`#include "config.h"` searches in the directory containing the including-file
before any other include path. This can include the wrong config.h when building
out of tree and a config.h exists in the source tree.

Using `#include <config.h>` uses exclusively the include path, and gives
priority to the build dir.
2024-06-26 00:26:43 +02:00
Peter Kokot
84a0da1574
Sync #if/ifdef/defined (#14508)
This syncs CPP macro conditions:
- _WIN32
- _WIN64
- HAVE_ALLOCA_H
- HAVE_ALPHASORT
- HAVE_ARPA_INET_H
- HAVE_CONFIG_H
- HAVE_DIRENT_H
- HAVE_DLFCN_H
- HAVE_GETTIMEOFDAY
- HAVE_LIBDL
- HAVE_POLL_H
- HAVE_PWD_H
- HAVE_SCANDIR
- HAVE_SYS_FILE_H
- HAVE_SYS_PARAM_H
- HAVE_SYS_SOCKET_H
- HAVE_SYS_TIME_H
- HAVE_SYS_TYPES_H
- HAVE_SYS_WAIT_H
- HAVE_UNISTD_H
- PHP_WIN32
- ZEND_WIN32

These are either undefined or defined to 1 in Autotools and Windows.

Follow up of GH-5526 (-Wundef).
2024-06-09 14:23:41 +02:00
Niels Dossche
1fdbb0aba6 Get rid of unused declarations 2024-05-13 19:46:51 +02:00
Niels Dossche
e7af2bfd5b Get rid of reserved name usage 2024-05-13 19:46:51 +02:00
Niels Dossche
44485892df Factor out all common code for XML serialization and merge common paths 2024-05-11 18:09:39 +02:00
Niels Dossche
6e7adb3c48
Update ext/dom names after policy change (#14171) 2024-05-09 10:40:53 +02:00
Niels Dossche
191d0501a5
Cleanup dom_html_document_encoding_write() (#13788) 2024-03-23 22:17:58 +01:00
Niels Dossche
b955973818 Only register error handling when observable
Closes GH-13702.
2024-03-17 18:24:40 +01:00
Niels Dossche
9fd74cfc9d Use temporary variables to reduce memory stores 2024-03-17 18:21:59 +01:00
Niels Dossche
cbc421e163 Add fast path for ASCII bytes in UTF-8 validation 2024-03-17 18:21:59 +01:00
Niels Dossche
cc0260e014
Change return type of DOM\HTMLDocument::saveHTML() (#13701)
Strict error checking is always true for classes in "new DOM".
This means that we always throw an error when calling
`php_dom_throw_error`, and therefore the false return value is not
actually possible.
Also change the stub to reflect this.
2024-03-13 21:49:40 +01:00
Niels Dossche
539d8d9259 Use common helper macro for getting the node in property handlers 2024-03-10 11:08:46 +01:00
Niels Dossche
d57e7a920b Use BAD_CAST consistently 2024-03-10 11:08:46 +01:00
Niels Dossche
6c55513e33 Use true instead of 1 with php_dom_throw_error 2024-03-10 11:08:46 +01:00
Niels Dossche
14b6c981c3
[RFC] Add a way to opt-in ext/dom spec compliance (#13031)
RFC: https://wiki.php.net/rfc/opt_in_dom_spec_compliance
2024-03-09 16:56:00 +01:00
Niels Dossche
2f1fe3209c Use a direct statically-known call for decoding in the fast path 2024-02-07 18:02:42 +01:00
Niels Dossche
89ea24f63e
Give anonymous dom structs a name (#13135) 2024-01-13 11:34:40 +01:00
Niels Dossche
a9064816db
Optimizations for HTML 5 loading (#12896)
* Fix inverted NULL and add dictionary

* Avoid useless error processing if no reporting is set

* Avoid double work while processing attributes and use fast text instantiation
2023-12-08 18:45:01 +01:00
Niels Dossche
1492be5286
[RFC] DOM HTML5 parsing and serialization support (#12111) 2023-11-13 20:18:19 +01:00