From e4a006cd3e17338677ec269a8cdb1354f38e0cad Mon Sep 17 00:00:00 2001 From: "Christoph M. Becker" Date: Fri, 19 Aug 2016 19:05:33 +0200 Subject: [PATCH] Fix #65732: grapheme_*() is not Unicode compliant on CR LF sequence According to the Unicode specification (at least as of 5.1), CRLF sequences are considered to be a single grapheme. We cater to that special case by letting grapheme_ascii_check() fail. While it would be trivial to fix grapheme_ascii_check() wrt. grapheme_strlen(), grapheme_substr() and grapheme_strrpos() would be much harder to handle, so we accept the slight performance penalty if CRLF is involved. --- NEWS | 4 ++++ ext/intl/grapheme/grapheme_util.c | 2 +- ext/intl/tests/bug65732.phpt | 19 +++++++++++++++++++ 3 files changed, 24 insertions(+), 1 deletion(-) create mode 100644 ext/intl/tests/bug65732.phpt diff --git a/NEWS b/NEWS index 013d85f84b1..e1d7f044162 100644 --- a/NEWS +++ b/NEWS @@ -12,6 +12,10 @@ PHP NEWS - IMAP: . Fixed bug #72852 (imap_mail null dereference). (Anatol) +- Intl: + . Fixed bug #65732 (grapheme_*() is not Unicode compliant on CR LF + sequence). (cmb) + - JSON: . Fixed bug #72787 (json_decode reads out of bounds). (Jakub Zelenka) diff --git a/ext/intl/grapheme/grapheme_util.c b/ext/intl/grapheme/grapheme_util.c index c752b02372e..350ba662558 100644 --- a/ext/intl/grapheme/grapheme_util.c +++ b/ext/intl/grapheme/grapheme_util.c @@ -221,7 +221,7 @@ int grapheme_ascii_check(const unsigned char *day, int32_t len) { int ret_len = len; while ( len-- ) { - if ( *day++ > 0x7f ) + if ( *day++ > 0x7f || (*day == '\n' && *(day - 1) == '\r') ) return -1; } diff --git a/ext/intl/tests/bug65732.phpt b/ext/intl/tests/bug65732.phpt new file mode 100644 index 00000000000..b49f884ee42 --- /dev/null +++ b/ext/intl/tests/bug65732.phpt @@ -0,0 +1,19 @@ +--TEST-- +Bug #65732 (grapheme_*() is not Unicode compliant on CR LF sequence) +--SKIPIF-- + +--FILE-- + +==DONE== +--EXPECT-- +int(1) +string(7) "ef +ghi" +int(2) +==DONE==