php-src/ext/dom
Niels Dossche 935fef29bd
Optimize DOM HTML serialization for UTF-8 (#16376)
* Use a direct call for decoding the UTF-8 buffer

* Add fast path for UTF-8 HTML serialization

This patch adds a fast path to the HTML serialization encoding that has
to encode to UTF-8. Because the DOM internally represents all strings
using UTF-8, we only need to validate here.

Tested on Wikipedia English home page on an i7-4790:
```
Benchmark 1: ./sapi/cli/php x.php
  Time (mean ± σ):     516.0 ms ±   6.4 ms    [User: 511.2 ms, System: 3.5 ms]
  Range (min … max):   506.0 ms … 527.1 ms    10 runs

Benchmark 2: ./sapi/cli/php_old x.php
  Time (mean ± σ):     682.8 ms ±   6.5 ms    [User: 676.8 ms, System: 3.8 ms]
  Range (min … max):   675.8 ms … 695.6 ms    10 runs

Summary
  ./sapi/cli/php x.php ran
    1.32 ± 0.02 times faster than ./sapi/cli/php_old x.php
```

(And if you're interested: it takes over a second on my machine using the old DOMDocument class)

Future optimizations are certainly possible, but let's start here.
2024-10-22 07:18:36 +02:00
..
lexbor Update Lexbor (#16288) 2024-10-08 19:15:45 +02:00
parentnode Merge branch 'PHP-8.3' into PHP-8.4 2024-09-25 19:39:49 +02:00
tests Merge branch 'PHP-8.4' 2024-10-22 00:17:44 +02:00
attr.c [ci skip] Fix typo 2024-09-23 22:19:15 +02:00
cdatasection.c Preferably include from build dir (#13516) 2024-06-26 00:26:43 +02:00
characterdata.c Minor cleanup in dom_character_data_append_data (#15173) 2024-07-30 23:05:12 +02:00
comment.c Preferably include from build dir (#13516) 2024-06-26 00:26:43 +02:00
config.m4 Rename inner_html_mixin.c to inner_outer_html_mixin.c 2024-10-05 23:26:33 +02:00
config.w32 Rename inner_html_mixin.c to inner_outer_html_mixin.c 2024-10-05 23:26:33 +02:00
CREDITS Add myself to ext-dom credits (#14718) 2024-06-29 15:18:34 +01:00
document.c ext/[cd]*: fix a bunch of typos (#16298) 2024-10-09 17:40:42 +02:00
documentfragment.c Preferably include from build dir (#13516) 2024-06-26 00:26:43 +02:00
documenttype.c Preferably include from build dir (#13516) 2024-06-26 00:26:43 +02:00
dom_ce.h Implement PHP-specific extensions to Dom (#14754) 2024-07-04 13:50:19 +02:00
dom_iterators.c Merge branch 'PHP-8.3' 2024-08-31 11:56:34 +02:00
dom_properties.h ext/[cd]*: fix a bunch of typos (#16298) 2024-10-09 17:40:42 +02:00
domexception.c Preferably include from build dir (#13516) 2024-06-26 00:26:43 +02:00
domexception.h Implement CSS selectors 2024-06-29 13:00:26 -07:00
domimplementation.c Use "must not" instead of "cannot" wording 2024-08-21 21:12:17 +01:00
element.c Merge branch 'PHP-8.4' 2024-10-17 21:21:56 +02:00
entity.c Suppress deprecation notices when ext/dom properties are accessed by the get_debug_info handler (#15530) 2024-08-23 10:39:11 +02:00
entityreference.c Deduplicate NULL checks in ext/dom (#15015) 2024-07-18 21:20:03 +02:00
html5_parser.c Support templated content 2024-07-15 11:10:51 +02:00
html5_parser.h Support templated content 2024-07-15 11:10:51 +02:00
html5_serializer.c Fix GH-15570: Segmentation fault (access null pointer) in ext/dom/html5_serializer.c 2024-08-25 15:09:30 +02:00
html5_serializer.h Support templated content 2024-07-15 11:10:51 +02:00
html_collection.c Merge branch 'PHP-8.3' 2024-08-31 11:56:34 +02:00
html_collection.h Support named items in dimension handling for HTMLCollection 2024-04-14 14:46:04 +02:00
html_document.c Optimize DOM HTML serialization for UTF-8 (#16376) 2024-10-22 07:18:36 +02:00
infra.c ext/[cd]*: fix a bunch of typos (#16298) 2024-10-09 17:40:42 +02:00
infra.h Implement Dom\Document::$title getter 2024-06-26 12:17:12 -07:00
inner_outer_html_mixin.c Fix GH-16356: Segmentation fault with $outerHTML and next node (#16364) 2024-10-11 20:44:50 +02:00
internal_helpers.h Support templated content 2024-07-15 11:10:51 +02:00
namednodemap.c Preferably include from build dir (#13516) 2024-06-26 00:26:43 +02:00
namespace_compat.c Support templated content 2024-07-15 11:10:51 +02:00
namespace_compat.h Support templated content 2024-07-15 11:10:51 +02:00
node.c Merge branch 'PHP-8.3' into PHP-8.4 2024-10-21 20:57:42 +02:00
nodelist.c Merge branch 'PHP-8.3' 2024-08-31 11:56:34 +02:00
nodelist.h Get rid of reserved name usage 2024-05-13 19:46:51 +02:00
notation.c Preferably include from build dir (#13516) 2024-06-26 00:26:43 +02:00
php_dom.c Merge branch 'PHP-8.4' 2024-10-16 22:55:29 +02:00
php_dom.h Merge branch 'PHP-8.3' into PHP-8.4 2024-10-17 21:21:49 +02:00
php_dom.stub.php Merge branch 'PHP-8.4' 2024-10-17 23:28:59 +02:00
php_dom_arginfo.h Merge branch 'PHP-8.4' 2024-10-17 23:28:59 +02:00
private_data.c Support templated content 2024-07-15 11:10:51 +02:00
private_data.h Support templated content 2024-07-15 11:10:51 +02:00
processinginstruction.c Preferably include from build dir (#13516) 2024-06-26 00:26:43 +02:00
serialize_common.h Small optimization in dom_local_name_compare_ex() (#15950) 2024-09-20 08:11:13 +02:00
text.c Fix GH-15137: Unexpected null pointer in Zend/zend_smart_str.h (#15138) 2024-07-28 13:53:30 +02:00
token_list.c Implement Dom\TokenList (#13664) 2024-07-02 21:34:23 +02:00
token_list.h Implement Dom\TokenList (#13664) 2024-07-02 21:34:23 +02:00
xml_common.h Merge branch 'PHP-8.3' into PHP-8.4 2024-10-08 20:58:20 +02:00
xml_document.c Support templated content 2024-07-15 11:10:51 +02:00
xml_serializer.c Fix XML serializer errata: xmlns="" serialization should be allowed 2024-09-15 21:30:53 +02:00
xml_serializer.h Support templated content 2024-07-15 11:10:51 +02:00
xpath.c Merge branch 'PHP-8.4' 2024-10-10 19:29:22 +02:00
xpath_callbacks.c ext/[cd]*: fix a bunch of typos (#16298) 2024-10-09 17:40:42 +02:00
xpath_callbacks.h Fix includes for separate builds for ext/dom (#14752) 2024-07-01 20:22:58 +02:00