Commit graph

109 commits

Author SHA1 Message Date
Niels Dossche
3cb7d1bd8a
Remove custom UTF-8 check function from ext/libxml
This was originally introduced as a workaround for a libxml2 bug [1].
This bug has been fixed for more than a decade [2], and we can use the
libxml2 API again. We bumped our version requirement for libxml2 beyond
that in 7.4 [3].

[1] 7e53511ec8
[2] 3ffe90ea1c
[3] 74235ca5f3

Closes GH-18706.
2025-05-30 10:40:23 +02:00
Gina Peter Banyard
c5aa03c8b9 ext/libxml: Use bool type instead of int type 2025-04-26 13:57:17 +01:00
Gina Peter Banyard
1f1cd5c4bc ext/libxml: Add some const qualifiers 2025-04-26 13:57:17 +01:00
Niels Dossche
6366da48ec
Use unsigned int for the reference count APIs in ext/libxml (#16706)
Also removes impossible conditions.
2024-11-06 17:47:35 +01:00
Niels Dossche
a0c29f0889
Use unsigned int instead of int for refcount for libxml objects (#15247) 2024-08-05 22:04:24 +02:00
Niels Dossche
6980eba863
Support templated content
The template element in HTML 5 is special in the sense that it does not
add its contents into the DOM tree, but instead keeps them in a separate
shadow DOM document fragment. Interacting with the DOM tree cannot touch
the elements in the document fragment.

Closes GH-14906.
2024-07-15 11:10:51 +02:00
Niels Dossche
8825235348
Reapply "Stop using reserved names in dom"
This reverts commit dda96768ec.
2024-07-08 17:27:39 +02:00
Niels Dossche
dda96768ec
Revert "Stop using reserved names in dom"
This reverts commit 013bc53f0c.

This somehow breaks the Windows build. Will investigate later.
2024-07-08 16:07:32 +02:00
Niels Dossche
013bc53f0c Stop using reserved names in dom 2024-07-08 06:09:04 -07:00
Niels Dossche
f9844f0348
Merge branch 'PHP-8.3'
* PHP-8.3:
  [ci skip] NEWS
  Backport libxml2 2.13.2 fixes (#14816)
2024-07-04 15:41:15 +02:00
Niels Dossche
ecf0bb0fd1
Merge branch 'PHP-8.2' into PHP-8.3
* PHP-8.2:
  [ci skip] NEWS
  Backport libxml2 2.13.2 fixes (#14816)
2024-07-04 15:37:35 +02:00
Niels Dossche
4fe821311c
Backport libxml2 2.13.2 fixes (#14816)
Backproted from https://github.com/php/php-src/pull/14789
2024-07-04 15:29:50 +02:00
Niels Dossche
85705eda71 Fix compilation on libxml2 2.13 2024-07-03 10:34:46 -07:00
Niels Dossche
fc09f4b2bc
Implement Dom\TokenList (#13664)
Part of RFC: https://wiki.php.net/rfc/dom_additions_84

Closes GH-11688.
2024-07-02 21:34:23 +02:00
Niels Dossche
768900b180 Implement Dom $innerHTML property 2024-07-02 11:15:38 -07:00
Niels Dossche
88da914910 Implement CSS selectors 2024-06-29 13:00:26 -07:00
Niels Dossche
dfde0d4cef Handle dumping node to file 2024-05-11 18:09:39 +02:00
Niels Dossche
0c490ade0d Handle dumping document to file 2024-05-11 18:09:39 +02:00
Niels Dossche
44485892df Factor out all common code for XML serialization and merge common paths 2024-05-11 18:09:39 +02:00
Niels Dossche
fae25ca2df Move dom_attr_value() into ext/libxml 2024-05-05 10:14:40 +02:00
Niels Dossche
30885f3b5f
Implement request #71571: XSLT processor should provide option to change maxDepth (#13731)
There are two depth limiting parameters for XSLT templates.
1) maxTemplateDepth
   This corresponds to the recursion depth of a template. For very
   complicated templates this can be hit.
2) maxTemplateVars
   This is the total number of live variables. When using recursive
   templates with lots of parameters you can hit this limit.

This patch introduces two new properties to XSLTProcessor that
corresponds to the above variables.
2024-03-31 21:21:23 +02:00
Niels Dossche
b955973818 Only register error handling when observable
Closes GH-13702.
2024-03-17 18:24:40 +01:00
Niels Dossche
14b6c981c3
[RFC] Add a way to opt-in ext/dom spec compliance (#13031)
RFC: https://wiki.php.net/rfc/opt_in_dom_spec_compliance
2024-03-09 16:56:00 +01:00
Niels Dossche
03547f6832
Remove properties field from php_libxml_node_object (#13062)
This shrinks the struct from 80 bytes to 72 bytes.
This was unused internally, I did not find users externally via GitHub
search.
The intention for this was that it could be used for attaching extra
data as a 3rd party to a node. However, there are better mechanisms for
that like using actual objects.
2024-01-03 20:03:56 +01:00
Niels Dossche
f3ee902c3d Merge branch 'PHP-8.2' into PHP-8.3
* PHP-8.2:
  Backport deprecation warning ignores to unbreak CI
2023-12-06 22:18:12 +01:00
Niels Dossche
e2d97314ab Backport deprecation warning ignores to unbreak CI
In master I use ZEND_DIAGNOSTIC_IGNORED_START, but that doesn't exist on
8.2 or 8.3 (8.3 has a similar macro though).
So to unbreak CI I just made a variation of this directly in the
php_libxml.h header.

See 683e787860 (commitcomment-134301083)

Closes GH-12887.
2023-12-06 22:17:27 +01:00
Niels Dossche
ae83d6ab07
Fix issues related to libxml2 2.12.0 (#12802)
* Avoid passing NULL to xmlSwitchToEncoding

This otherwise switches to UTF-8 on libxml2 2.12.0

* Split tests for different error reporting behaviour in libxml2 2.12.0

* Avoid deprecation warnings for libxml2 2.12.0

We can't fully get rid of the parser globals as there are still APIs
that implicitly use them.

* Temporarily disable part of test for libxml 2.12.0 regression

See https://gitlab.gnome.org/GNOME/libxml2/-/issues/634

* Review fixes

* [ci skip] Update test description
2023-11-29 20:46:35 +01:00
Niels Dossche
6f215e0727 Merge branch 'PHP-8.3'
* PHP-8.3:
  Fix GH-12616: DOM: Removing XMLNS namespace node results in invalid default: prefix
  Fix GH-12702: libxml2 2.12.0 issue building from src
2023-11-17 19:58:57 +01:00
Niels Dossche
2b42b73c0b Merge branch 'PHP-8.2' into PHP-8.3
* PHP-8.2:
  Fix GH-12616: DOM: Removing XMLNS namespace node results in invalid default: prefix
  Fix GH-12702: libxml2 2.12.0 issue building from src
2023-11-17 19:58:31 +01:00
Niels Dossche
8a95e616b9 Fix GH-12702: libxml2 2.12.0 issue building from src
Fixes GH-12702.

Co-authored-by: nono303 <github@nono303.net>
2023-11-17 19:46:30 +01:00
Niels Dossche
1492be5286
[RFC] DOM HTML5 parsing and serialization support (#12111) 2023-11-13 20:18:19 +01:00
Niels Dossche
0cab865275 Fix compile error when php_libxml.h is included in C++
See https://github.com/php/pecl-xml-xmldiff/issues/1
2023-10-15 11:48:14 +02:00
Niels Dossche
eebc528cbf Fix broken cache invalidation with deallocated and reallocated document node
The original caching implementation had an oversight in combination with
the new lifetime management in DOM for 8.3.
The modification counter is stored on the document object itself, but as
that can get deallocated when all references disappear, stale cache data
can be used. Normally this isn't a problem, unless getElementsByTagName is
called not on the document but on a child node. Fix it by moving caching
data into the ref object, which will outlive all nodes from a document
even if the document object disappears.

Closes GH-12338.
2023-10-01 17:06:02 +02:00
Niels Dossche
bb092ab4c6 Fix #80927: Removing documentElement after creating attribute node: possible use-after-free
Closes GH-11892.
2023-08-12 18:49:12 +02:00
Derick Rethans
86afbe10e2 Merge branch 'PHP-8.2' 2023-07-31 19:57:02 +01:00
Derick Rethans
deddf4692a Merge branch 'PHP-8.1' into PHP-8.2 2023-07-31 19:54:44 +01:00
Derick Rethans
0870ebb862 Merge branch 'PHP-8.0' into PHP-8.1 2023-07-31 19:53:43 +01:00
Niels Dossche
c283c3ab0b Sanitize libxml2 globals before parsing
Fixes GHSA-3qrf-m4j2-pcrr.

To parse a document with libxml2, you first need to create a parsing context.
The parsing context contains parsing options (e.g. XML_NOENT to substitute
entities) that the application (in this case PHP) can set.
Unfortunately, libxml2 also supports providing default set options.
For example, if you call xmlSubstituteEntitiesDefault(1) then the XML_NOENT
option will be added to the parsing options every time you create a parsing
context **even if the application never requested XML_NOENT**.

Third party extensions can override these globals, in particular the
substitute entity global. This causes entity substitution to be
unexpectedly active.

Fix it by setting the parsing options to a sane known value.
For API calls that depend on global state we introduce
PHP_LIBXML_SANITIZE_GLOBALS() and PHP_LIBXML_RESTORE_GLOBALS().
For other APIs that work directly with a context we introduce
php_libxml_sanitize_parse_ctxt_options().
2023-07-31 19:47:19 +01:00
Remi Collet
fde4386648
cast _private to avoid [-fpermissive] error 2023-07-20 07:55:01 +02:00
Niels Dossche
c3f0797385
Implement iteration cache, item cache and length cache for node list iteration (#11330)
* Implement iteration cache, item cache and length cache for node list iteration

The current implementation follows the spec requirement that the list
must be "live". This means that changes in the document must be
reflected in the existing node lists without requiring the user to
refetch the node list.
The consequence is that getting any item, or the length of the list,
always starts searching from the root element of the node list. This
results in O(n) time to get any item or the length. If there's a for
loop over the node list, this means the iterations will take O(n²) time
in total. This causes real-world performance issues with potential for
downtime (see GH-11308 and its references for details).

We fix this by introducing a caching strategy. We cache the last
iterated object in the iterator, the last requested item in the node
list, and the last length computation. To invalidate the cache, we
simply count the number of modifications made to the containing
document. If the modification number does not match what the number was
during caching, we know the document has been modified and the cache is
invalid. If this ever overflows, we saturate the modification number and
don't do any caching anymore. Note that we don't check for overflow on
64-bit systems because it would take hundreds of years to overflow.

Fixes GH-11308.
2023-06-03 00:13:14 +02:00
Niels Dossche
b8840115ff
Shrink libxml_doc_props struct (#11326)
These values are only ever bools, store them as bools.
Reduces the size from 40 bytes to 16 bytes on my system.
2023-05-29 11:41:42 +02:00
George Peter Banyard
fb114bf45b Only use FCC for libxml entity loader callback 2022-11-02 14:52:54 +00:00
Tim Starling
11796229f2
Add libxml_get_external_entity_loader()
Add libxml_get_external_entity_loader(), which returns the currently
installed external entity loader, i.e. the value which was passed to
libxml_set_external_entity_loader() or null if no loader was installed
and the default entity loader will be used.

This allows libraries to save and restore the loader, controlling entity
expansion without interfering with the rest of the application.

Add macro Z_PARAM_FUNC_OR_NULL_WITH_ZVAL(). This allows us to get the
zval for a callable parameter without duplicating callable argument
parsing.

The saved zval keeps the object needed for fcc/fci alive, simplifying
memory management.

Fixes #76763.
2022-08-28 12:47:20 +01:00
KsaR
01b3fc03c3
Update http->https in license (#6945)
1. Update: http://www.php.net/license/3_01.txt to https, as there is anyway server header "Location:" to https.
2. Update few license 3.0 to 3.01 as 3.0 states "php 5.1.1, 4.1.1, and earlier".
3. In some license comments is "at through the world-wide-web" while most is without "at", so deleted.
4. fixed indentation in some files before |
2021-05-06 12:16:35 +02:00
Nikita Popov
3e01f5afb1 Replace zend_bool uses with bool
We're starting to see a mix between uses of zend_bool and bool.
Replace all usages with the standard bool type everywhere.

Of course, zend_bool is retained as an alias.
2021-01-15 12:33:06 +01:00
George Peter Banyard
35e0a91db7 Fix [-Wundef] warnings in libxml extension 2020-05-16 15:31:23 +02:00
Gabriel Caruso
5d6e923d46
Remove mention of PHP major version in Copyright headers
Closes GH-4732.
2019-09-25 14:51:43 +02:00
Peter Kokot
92ac598aab Remove local variables
This patch removes the so called local variables defined per
file basis for certain editors to properly show tab width, and
similar settings. These are mainly used by Vim and Emacs editors
yet with recent changes the once working definitions don't work
anymore in Vim without custom plugins or additional configuration.
Neither are these settings synced across the PHP code base.

A simpler and better approach is EditorConfig and fixing code
using some code style fixing tools in the future instead.

This patch also removes the so called modelines for Vim. Modelines
allow Vim editor specifically to set some editor configuration such as
syntax highlighting, indentation style and tab width to be set in the
first line or the last 5 lines per file basis. Since the php test
files have syntax highlighting already set in most editors properly and
EditorConfig takes care of the indentation settings, this patch removes
these as well for the Vim 6.0 and newer versions.

With the removal of local variables for certain editors such as
Emacs and Vim, the footer is also probably not needed anymore when
creating extensions using ext_skel.php script.

Additionally, Vim modelines for setting php syntax and some editor
settings has been removed from some *.phpt files.  All these are
mostly not relevant for phpt files neither work properly in the
middle of the file.
2019-02-03 21:03:00 +01:00
Zeev Suraski
0cf7de1c70 Remove yearly range from copyright notice 2019-01-30 11:03:12 +02:00
Peter Kokot
8d3f8ca12a Remove unused Git attributes ident
The $Id$ keywords were used in Subversion where they can be substituted
with filename, last revision number change, last changed date, and last
user who changed it.

In Git this functionality is different and can be done with Git attribute
ident. These need to be defined manually for each file in the
.gitattributes file and are afterwards replaced with 40-character
hexadecimal blob object name which is based only on the particular file
contents.

This patch simplifies handling of $Id$ keywords by removing them since
they are not used anymore.
2018-07-25 00:53:25 +02:00