Albeit CSV is still a widespread data exchange format, it has never been
officially standardized. There exists, however, the “informational” RFC
4180[1] which has no notion of escape characters, but rather defines
`escaped` as strings enclosed in double-quotes where contained
double-quotes have to be doubled. While this concept is supported by
PHP's implementation (`$enclosure`), the `$escape` sometimes interferes,
so that `fgetcsv()` is unable to correctly parse externally generated
CSV, and `fputcsv()` is sometimes generating non-compliant CSV. Since
PHP's `$escape` concept is availble for many years, we cannot drop it
for BC reasons (even though many consider it as bug). Instead we allow
to pass an empty string as `$escape` parameter to the respective
functions, which results in ignoring/omitting any escaping, and as such
is more inline with RFC 4180. It is noteworthy that this is almost no
userland BC break, since formerly most functions did not accept an empty
string, and failed in this case. The only exception was `str_getcsv()`
which did accept an empty string, and used a backslash as escape
character then (which appears to be unintended behavior, anyway).
The changed functions are `fputcsv()`, `fgetcsv()` and `str_getcsv()`,
and also the `::setCsvControl()`, `::getCsvControl()`, `::fputcsv()`,
and `::fgetcsv()` methods of `SplFileObject`.
The implementation also changes the type of the escape parameter of the
PHP_APIs `php_fgetcsv()` and `php_fputcsv()` from `char` to `int`, where
`PHP_CSV_NO_ESCAPE` means to ignore/omit escaping. The parameter
accepts the same values as `isalpha()` and friends, i.e. “the value of
which shall be representable as an `unsigned char` or shall equal the
value of the macro `EOF`. If the argument has any other value, the
behavior is undefined.” This is a subtle BC break, since the character
`chr(128)` has the value `-1` if `char` is signed, and so likely would
be confused with `EOF` when converted to `int`. We consider this BC
break to be acceptable, since it's rather unlikely that anybody uses
`chr(128)` as escape character, and it easily can be fixed by casting
all `escape` arguments to `unsigned char`.
This patch implements the feature requests 38301[2] and 51496[3].
[1] <https://tools.ietf.org/html/rfc4180>
[2] <https://bugs.php.net/bug.php?id=38301>
[3] <https://bugs.php.net/bug.php?id=51496>
The $Id$ keywords were used in Subversion where they can be substituted
with filename, last revision number change, last changed date, and last
user who changed it.
In Git this functionality is different and can be done with Git attribute
ident. These need to be defined manually for each file in the
.gitattributes file and are afterwards replaced with 40-character
hexadecimal blob object name which is based only on the particular file
contents.
This patch simplifies handling of $Id$ keywords by removing them since
they are not used anymore.
If the longest common substring is the leftmost common substring, there
is no need to check the string prefixes for further common substrings,
since there are none.
* 'master' of git.php.net:/php-src: (37 commits)
Avoid conditions inside loop
Improve loop vectorization
Improve loop vectorization
Remove unused function
Fixed bug #75938
Remove unused files
Fixed bug #75940 Unnecessary compile wrapper with PHP_THREAD_SAFETY=yes
typo
Update README.GIT-RULES
Fix SKIPIF section
Fixes bug #75871 Use pkg-config for libxml2 if available
Fixed bug #49876 lib path on 64bit distros
Refactor testing READMEs
Fixed bug #65414
Fixed bug #65414
Fixed bug #74519 strange behavior of AppendIterator
fix#74519 strange behavior of AppendIterator
Use bool instead of boolean
Remove space between function name and open parentheses
Fix some misspellings
...