Commit graph

206 commits

Author SHA1 Message Date
Gustavo André dos Santos Lopes
f4a896c209 - PHP uses a big endian representation when it converts the
code unit sequences to integers so as to store the entity
  maps. Code in traverse_for_entities assumed little
  endian. Fixed.
  (in practice, due to the absence of unicode and entity
  mappings for multi-byte encodings -- except UTF-8 --, this
  doesn't matter, so the relevant code was commented out for
  performance reasons).
2010-10-11 22:26:10 +00:00
Gustavo André dos Santos Lopes
7aa43a8d83 - Revamp of the decoding portion of html.c.
- Dramatic improvements on the performance of html_entity_decode and htmlspecialchars_decode, as the
  string is now traversed only once. Speedups of 20 to 25 times with Windows release builds and a
  ~250 characters string (for 2nd and subsequent calls).
- Consistent behavior on html_entity_decode. For instance, the entity in "&<" would be decoded,
  but not "&é". Not anymore. The code path for "basic" and non-basic entities is now mostly
  shared.
- Code of html_entity_decode and htmlspecialchars_decode is now shared.
- [DOC] More consistent behavior of htmlspecialchars_decode. Instead of translating only <, >,
  &, ", ' and ', now e.g. ", ', ', ', etc. are also decoded.
- [DOC] Previous translation of unicode code points in numerical entities was seriously broken. When
  the code points for some character were not the same in unicode and the target encoding, the
  behavior could be an erroneous translation (e.g. 0x80-0xA0 in win-1252) or no translation at all.
  Added unicode translation tables for all single-byte encodings. Entities are not translated for
  multi-byte entities, except for ASCII characters whose code points are shared. We could add
  the huge translation tables (several thousand elements) for those encodings in the future.
- Fixed numerical entities that after # had text accepted by strcol being accepted.
- Much more commented and well-structured code...
- Tests for get_html_translation_table()) are broken. I stared fixing the tests, but then I realized
  it was completely helpless because get_html_translation_table() is broken by not handling
  multi-byte characters correctly.
2010-10-10 19:04:59 +00:00
Gustavo André dos Santos Lopes
dd5d1b2b66 - Fixed a typo in rev #304208 (24 instead of 34/'"').
- Improved the test bug53021.phpt to reflect other fixes in rev #304208.
- Updated NEWS to reflect other fixes in rev #304208.
2010-10-08 17:27:19 +00:00
Gustavo André dos Santos Lopes
df42830468 - Fixed bug #53021 (In html_entity_decode, failure to convert numeric entities with ENT_NOQUOTES and ISO-8859-1). 2010-10-08 16:19:58 +00:00
Kalle Sommer Nielsen
cb50011016 Fixed compiler warnings in the standard library 2010-09-23 03:45:36 +00:00
Rasmus Lerdorf
906dd4eac5 Switch default_charset, if not specified, from ISO-8859-1 to UTF-8
I have been wanting to make this change for years, but there is a small
chance of BC issues, so it shouldn't go into a minor release.
2010-03-23 18:08:06 +00:00
Moriyoshi Koizumi
73ba495674 - Forgot to commit this patch. Sorry. 2010-03-12 16:19:25 +00:00
Sebastian Bergmann
9ba1e81665 sed -i "s#1997-2009#1997-2010#g" **/*.c **/*.h **/*.php 2010-01-03 09:23:27 +00:00
Moriyoshi Koizumi
7d9a7dbad6 - Fix bug #46478 (htmlentities() uses obsolete mapping table for character
entity references)
2009-12-22 05:50:34 +00:00
Moriyoshi Koizumi
413196c574 - Take account of surrogate pairs. 2009-12-07 15:41:43 +00:00
Moriyoshi Koizumi
20737bac6a - Bug #49785: take 5. What the hell happened to me... 2009-10-13 05:18:37 +00:00
Moriyoshi Koizumi
884cf3f1c0 - Bug #49785: take 4 - typo. this flaw is unharmful since the return value of get_next_char() is only used when UTF-8 is specified to the third argument. 2009-10-12 14:29:45 +00:00
Moriyoshi Koizumi
1835a63dfd - A couple more fix for my previous fix.
(one of the fix by Arnaud Le Blanc. Thanks!)
2009-10-11 23:52:33 +00:00
Moriyoshi Koizumi
9d19866476 - Fixed bug #49785 (insufficient input string validation of htmlspecialchars()). 2009-10-09 10:02:38 +00:00
Sebastian Bergmann
08659c2dcd MFH: Bump copyright year, 3 of 3. 2008-12-31 11:15:49 +00:00
Arnaud Le Blanc
18794addbd MFH: Added ENT_IGNORE as a compatibility flag for htmlentities() and
htmlspecialchars() to skip multibyte sequences intead of returning an
empty string (as iconv's //IGNORE). These functions will still never
return an invalid or incomplete multibyte sequence.
Fixes #43896
2008-11-26 03:00:06 +00:00
Arnaud Le Blanc
a05edaf2bd MFB 5.2 2008-11-26 02:43:16 +00:00
Arnaud Le Blanc
d69dfa4b9f MFH: initialize optional vars 2008-10-21 22:08:38 +00:00
Moriyoshi Koizumi
0699894884 - MFH: beware of signedness 2008-08-18 03:26:21 +00:00
Arnaud Le Blanc
71e50de4fc MFH: Fixed bug #45581 (htmlspecialchars() double encoding &#x hex items) 2008-08-10 13:26:13 +00:00
Felipe Pena
fce4f9600e MFB: Fixed bug #44703 (htmlspecialchars() does not detect bad character set argument) 2008-04-11 19:06:12 +00:00
Stanislav Malyshev
223a53fdeb rm cruft 2008-01-29 22:03:01 +00:00
Antony Dovgal
37a607c7f8 fix #43927 (koi8r is missing from html_entity_decode())
patch by andy at demos dot su
2008-01-28 23:07:12 +00:00
Scott MacVicar
23e3baf62d Fix html_entity_decode when converting numeric html entities, the numeric values for the extended characters don't correspond to that of windows-1251 and cp866. 2008-01-25 18:10:45 +00:00
Sebastian Bergmann
d1dded8751 MFH: Bump copyright year, 2 of 2. 2007-12-31 07:17:19 +00:00
Jani Taskinen
14ca778ed9 MFH:- Revert previous patch, it was correct to do this, error is logged if logging is enabled 2007-12-11 12:26:43 +00:00
Jani Taskinen
b984960e81 MFH: fix error displaying 2007-12-11 11:29:09 +00:00
Jani Taskinen
aa3eee1dce MFH:- Moved the old regex functions to own extension: ereg 2007-10-05 15:00:09 +00:00
Stanislav Malyshev
6e1dfff1ed MFB do not accept partial multibyte sequences in html* functions 2007-10-03 05:05:08 +00:00
Nuno Lopes
2c5368c013 fix handling of && by htmlentities 'no-double-encode'
expand the test cases
2007-05-27 15:57:11 +00:00
Nuno Lopes
452524fe3a fix the new 'no-double-encoding' feature of htmlspecialchars() (the length for char search was wrong. this could lead to crashes..) 2007-05-27 15:45:18 +00:00
Hannes Magnusson
df03be1a3b Allow skipping hint_charset (fixes ext/standard/tests/strings/htmlentities18.phpt) 2007-05-25 14:09:02 +00:00
Hannes Magnusson
cdd37424a8 Update proto&arginfo for double_encode in htmlspecialchars()&htmlentities() 2007-05-22 15:38:27 +00:00
Ilia Alshanetsky
c98cbb6020 [DOC] Added a 4th parameter flag to htmlspecialchars() and htmlentities()
that makes the function not encode existing html entities. The feature is
disabled by default and can be activated by passing FALSE as the 4th param
2007-05-22 12:37:00 +00:00
Ilia Alshanetsky
efad70c2cc snprintf() -> slprintf() 2007-02-27 03:28:17 +00:00
Ilia Alshanetsky
27c6f40783 Eliminate strncpy() and simplify code 2007-02-24 17:18:24 +00:00
Ilia Alshanetsky
5ecffe6eb5 Use strlcpy() rather then strcpy() 2007-02-21 03:59:05 +00:00
Antony Dovgal
84a827e0d4 MFH 2007-01-18 16:21:32 +00:00
Sebastian Bergmann
4223aa4d5e MFH: Bump year. 2007-01-01 09:36:18 +00:00
Antony Dovgal
6aec52bde7 MFH 2006-12-21 01:18:28 +00:00
Ilia Alshanetsky
3a533934c7 Added missing boundary checks. 2006-11-01 01:55:11 +00:00
Hannes Magnusson
39219cf7fe protos 2006-10-02 07:58:13 +00:00
Antony Dovgal
f3c1722b0c MFH: don't try to use "auto", "none" and "pass" charsets returned from mbstring 2006-08-15 15:09:38 +00:00
Rasmus Lerdorf
8fe5bc7010 MFH - binary safety patch from Moriyoshi 2006-02-25 21:32:11 +00:00
foobar
5bd93221a8 bump year and license version 2006-01-01 12:51:34 +00:00
foobar
23e671a51e - Bumber up year 2005-08-03 14:08:58 +00:00
foobar
6cea418c31 Netware also uses autoconf based config now 2005-06-30 14:11:13 +00:00
Joe Orton
5815b03511 Mark pointers in entity tables as const. 2005-05-11 12:54:29 +00:00
Joe Orton
bd2e99ee50 - Fixed bug #29119 (html_decode_entities handling of U+0152-U+0192 range)
(merge error from 4.3)
2005-05-11 12:43:07 +00:00
Ilia Alshanetsky
8209835e5a Fixed bug #32608 (html_entity_decode() converts single quotes even if
ENT_NOQUOTES is given).
2005-05-01 19:48:55 +00:00