Commit graph

805 commits

Author SHA1 Message Date
Gustavo André dos Santos Lopes
0a7ae87e91 Added IntlCodePointBreakIterator.
Objects of this class can be instantiated with

IntlBreakIterator::createCodePointInstance()

The method does not take a locale, as it would not make sense in this
context.

This class has one additional method:

long IntlCodePointIterator::getLastCodePoint()

which returns either -1 or the last code point we moved over, if any
(and discounting any movement before the last call to
IntlBreakIterator::first() or IntlBreakIterator::last()).
2012-06-22 18:19:54 +02:00
Gustavo André dos Santos Lopes
cee31091a9 Add Intl prefix to BreakIterator/RuleBasedBI 2012-06-10 22:42:38 +02:00
Gustavo André dos Santos Lopes
87dd0269ba Remove trailing space 2012-06-10 13:26:28 +02:00
Gustavo André dos Santos Lopes
a4925fae9b Replaced zend_parse_method_params with plain zpp 2012-06-10 00:23:09 +02:00
Gustavo André dos Santos Lopes
afed66bb9e BreakIter: Removed getAvailableLocales/getHashCode 2012-06-10 00:05:00 +02:00
Gustavo André dos Santos Lopes
4ec75539db Change in BreakIterator::getPartsIterator()
BreakIterator::getPartsIterator() now returns an IntlIterator subclass
with a special method, getBreakIterator(), that returns the
associated BreakIterator.

Any call to getRuleStatus() is forwarded to the BreakIterator.
2012-06-10 00:04:53 +02:00
Xinchen Hui
07d0eab204 Merge branch 'PHP-5.4'
By Gustavo André dos Santos Lopes (4) and others
via Felipe Pena (2) and Xinchen Hui (2)
* PHP-5.4:
  Remove unused codes
  based on microsoft's description,the direct convert from FILETIME struct to __int64 is unsafe.
  merge 5.3 entries
  restore NEWS
  Fix ext/intl build on ICU < 4.8
  Optimization in ext/intl/msgformat
  Fixed tests in ext/intl
  Changed XFAILed collator_get_sort_key.phpt
2012-06-07 14:42:35 +08:00
Xinchen Hui
83542dcf3b Merge branch 'PHP-5.3' into PHP-5.4
By Gustavo André dos Santos Lopes (4) and others
via Felipe Pena (1) and Xinchen Hui (1)
* PHP-5.3:
  Remove unused codes
  based on microsoft's description,the direct convert from FILETIME struct to __int64 is unsafe.
  Fix ext/intl build on ICU < 4.8
  Optimization in ext/intl/msgformat
  Fixed tests in ext/intl
  Changed XFAILed collator_get_sort_key.phpt
2012-06-07 14:32:47 +08:00
Gustavo André dos Santos Lopes
c1ac325228 Fix ext/intl build on ICU < 4.8 2012-06-06 12:10:00 +02:00
Gustavo André dos Santos Lopes
52d541a314 Optimization in ext/intl/msgformat
Don't transform the string to make it apostrophe friendly in ICU 4.8+
as that it is now the default.
2012-06-06 11:36:00 +02:00
Gustavo André dos Santos Lopes
45b3fa4dee Fixed tests in ext/intl
21 is not a valid value for UNUM_PADDING_POSITION. Changed the test to
use 2 instead.

Remove ICU 4.2- test. No one cares.
2012-06-05 16:47:00 +02:00
Gustavo André dos Santos Lopes
c6593a0e9b BreakIterator: add rules status constants 2012-06-04 23:09:10 +02:00
Gustavo André dos Santos Lopes
036b1eb291 Tests for (RuleBased)BreakIterator. 2012-06-04 22:25:08 +02:00
Gustavo André dos Santos Lopes
f5b421621d BreakIterator and RuleBasedBreakiterator added
This commit adds wrappers for the classes BreakIterator and
RuleBasedbreakIterator. The C++ ICU classes are described here:
<http://icu-project.org/apiref/icu4c/classBreakIterator.html>
<http://icu-project.org/apiref/icu4c/classRuleBasedBreakIterator.html>

Additionally, a tutorial is available at:
<http://userguide.icu-project.org/boundaryanalysis>

This implementation wraps UTF-8 text in a UText. The text is
iterated without any copying or conversion to UTF-16. There is
also no validation that the input is actually UTF-8; where there
are malformed sequences, the UText will simply U+FFFD.

The class BreakIterator cannot be instantiated directly (has a
private constructor). It provides the interface exposed by the ICU
abstract class with the same name. The PHP class is not abstract
because we may use it to wrap native subclasses of BreakIterator
that we don't know how to wrap. This class includes methods to
move the iterator position to the beginning (first()), to the
end (last()), forward (next()), backwards (previous()), to the
boundary preceding a certain position (preceding()) and following
a certain position (following()) and to obtain the current position
(current()). next() can also be used to advance or recede an
arbitrary number of positions.

BreakIterator also exposes other native methods:
getAvailableLocales(), getLocale() and factory methods to build
several predefined types of BreakIterators: createWordInstance()
for word boundaries, createCharacterInstance() for locale
dependent notions of "characters", createSentenceInstance() for
sentences, createLineInstance() and createTitleInstance() -- for
title casing breaks. These factories currently return
RuleBasedbreakIterators where the names of the rule sets are found
in the ICU data, observing the passed locale (although the locale
is taken into considering there are very few exceptions to the
root rules).

The clone and compare_object PHP object handlers are also
implemented, though the comparison does not yield meaningful results
when used with >, <, >= and <=.

Note that BreakIterator is an iterator only in the sense of the
first 'Iterator' in 'IteratorIterator', i.e., it does not
implement the Iterator interface. The reason is that there is
no sensible implementation for Iterator::key(). Using it for
an ordinal of the current boundary is not feasible because
we are allowed to move to any boundary at any time. It we were
to determine the current ordinal when last() is called we'd
have to traverse the whole input text to find out how many
breaks there were before. Therefore, BreakIterator implements
only Traversable. It can be wrapped in an IteratorIterator,
but the usual warnings apply.

Finally, I added a convenience method to BreakIterator:
getPartsIterator(). This provides an IntlIterator, backed
by the BreakIterator PHP object (i.e. moving the pointer or
changing the text in BreakIterator affects the iterator
and also moving the iterator affects the backing BreakIterator),
which allows traversing the text between each boundary.
This iterator uses the original text to retrieve the text
between two positions, not the code points returned by the
wrapping UText. Therefore, if the text includes invalid code
unit sequences, these invalid sequences will be in the output
of this iterator, not U+FFFD code points.

The class RuleBasedIterator exposes a constructor that allows
building an iterator from arbitrary compiled or non-compiled
rules. The form of these rules in described in the tutorial linked
above. The rest of the methods allow retrieving the rules --
getRules() and getCompiledRules() --, a hash code of the rule set
(hashCode()) and the rules statuses (getRuleStatus() and
getRuleStatusVec()).

Because the RuleBasedBreakIterator constructor may return parse
errors, I reuse the UParseError to text function that was in the
transliterator files. Therefore, I move that function to
intl_error.c.

common_enum.cpp was also changed, mainly to expose previously
static functions. This avoided code duplication when implementing
the BreakIterator iterator and the IntlIterator returned by
BreakIterator::getPartsIterator().
2012-06-04 22:25:07 +02:00
Gustavo André dos Santos Lopes
9b233b7e5e Changed XFAILed collator_get_sort_key.phpt
Ressurected and limited to ICU 4.8 in the hope that the sort keys
will remain stable in more recent ICU versions. I have only tested
with ICU 4.8 so far.
2012-06-04 10:18:24 +02:00
Gustavo André dos Santos Lopes
758f0686d4 Added and fixed tests given eb346ef 2012-06-04 00:02:35 +02:00
Gustavo André dos Santos Lopes
eb346ef0f4 DateFormat plays nice with Calendar, TimeZone
The following changes were made:

* The IntlDateFormatter constructor now accepts the usual values
  for its $timezone argument. This includes timezone identifiers,
  IntlTimeZone objects, DateTimeZone objects and NULL. An empty
  string is not accepted. An invalid time zone is no longer accepted
  (it used to use UTC in this case).
* When NULL is passed to IntlDateFormatter, the time zone specified in
  date.timezone is used instead of the ICU default.
* The IntlDateFormatter $calendar argument now accepts also an
  IntlCalendar. In this case, IntlDateFormatter::getCalendar() will
  return false.
* The time zone passed to the IntlDateFormatter is ignored if it is
  NULL and if the calendar passed is an IntlCalendar object -- in this
  case, the IntlCalendar time zone will be used instead. Otherwise,
  the time zone specified in the $timezone argument is used instead.
* Added IntlDateFormatter::getCalendarObject(), which always returns
  the IntlCalendar object that backs the DateFormat, even if a
  constant was passed to the constructor, i.e., if an IntlCalendar
  was not passed to the constructor.
* Added IntlDateFormatter::setTimeZone(). It accepts the usual values
  for time zone arguments. If NULL is passed, the time zone of the
  IntlDateFormatter WILL be overridden with the default time zone,
  even if an IntlCalendar object was passed to the constructor.
* Added IntlDateFormatter::getTimeZone(), which returns the time zone
  that's associated with the DateFormat.
* Depreacated IntlDateFormatter::setTimeZoneId() and made it an alias
  for IntlDateFormatter::setTimeZone(), as the new ::setTimeZone()
  also accepts plain identifiers, besides other types.
  IntlDateFormatter::getTimeZoneId() is not deprecated however.
* IntlDateFormatter::setCalendar() with a constant passed should now
  work correctly. This requires saving the requested locale to the
  constructor.
* Centralized the hacks required to avoid compilation disasters on
  Windows due to some headers being included inside and outside of
  extern "C" blocks.
2012-06-04 00:01:48 +02:00
Gustavo André dos Santos Lopes
72beff0d41 Added private constructor to IntlTimeZone. 2012-06-03 23:39:34 +02:00
Gustavo André dos Santos Lopes
f3802db7a0 Fixed write in constant memory.
clang did not forgive.
2012-06-03 23:39:27 +02:00
Stanislav Malyshev
ec2029a894 Merge branch 'PHP-5.4'
* PHP-5.4:
  fix test
  fix test
2012-05-29 23:53:01 -07:00
Stanislav Malyshev
9b98cf7865 fix test 2012-05-29 23:52:47 -07:00
Gustavo André dos Santos Lopes
a1e97bada8 Fixed problem in IntlCalendar debug handler
*is_temp was not being set.

Also deleted a redundant assignment to *is_temp in IntlTimeZone.
2012-05-25 13:29:19 +02:00
Gustavo André dos Santos Lopes
457a57d653 Merge branch '5.4' 2012-05-24 14:33:42 +02:00
Gustavo André dos Santos Lopes
04fd0b1098 Merge branch '5.3' into 5.4 2012-05-24 14:33:24 +02:00
Gustavo André dos Santos Lopes
85c777d2f1 Fixed bug #55610: ResourceBundle and Traversable 2012-05-24 14:33:05 +02:00
Gustavo André dos Santos Lopes
888e77ff73 Fixed last commit on 5.4
There's no change from the intended behavior. If INTL_G(default_locale)
is NULL, the default ICU locale, as given by locale_get_default() in
master, will still be used by ures_open().
2012-05-24 14:17:52 +02:00
Gustavo André dos Santos Lopes
a03f2e3814 Merge branch '5.4'
Conflicts:
	UPGRADING
2012-05-24 13:52:06 +02:00
Gustavo André dos Santos Lopes
92039fed22 Changed ResourceBundle constructor behavior
null is now accepted for two first (mandatory arguments).

Passing null as the package name causes NULL to be passed to ICU  and
the default ICU data to be loaded.

Passing null as the locale name causes the default locale to be used.
2012-05-24 13:50:59 +02:00
Gustavo André dos Santos Lopes
d4fd95e292 Merge branch '5.4' 2012-05-24 11:09:18 +02:00
Gustavo André dos Santos Lopes
e8009e2dca Merge branch '5.3' into 5.4 2012-05-24 11:08:55 +02:00
Gustavo André dos Santos Lopes
2da2de46a8 Fixed bug #60785
Memory leak in IntlDateFormatter constructor.

udat_setCalendar() clones the calendar before it adopts it,
so we were leaking the original calendar.

Also we now validate the calendar type.
2012-05-24 11:06:21 +02:00
Gustavo André dos Santos Lopes
ca515e8073 Merge branch '5.4' 2012-05-23 15:52:47 +02:00
Gustavo André dos Santos Lopes
0838a2b7c5 Merge branch '5.3' into 5.4 2012-05-23 15:52:32 +02:00
Gustavo André dos Santos Lopes
e08566c613 Fixed bug #62017
IntlDateFormatter constructor would release some resources
under certain error conditions.
2012-05-23 15:52:19 +02:00
Gustavo André dos Santos Lopes
70e3e627fe Fixed several ext/intl tests 2012-05-23 14:49:01 +02:00
Gustavo André dos Santos Lopes
2eb069aa48 Merge branch '5.4' 2012-05-23 13:27:54 +02:00
Gustavo André dos Santos Lopes
8ee8ccda19 Merge branch '5.3' into 5.4
Conflicts:
	sapi/fpm/fpm/fpm_main.c
2012-05-23 13:27:21 +02:00
Gustavo André dos Santos Lopes
1eff3b01b8 Fixed bug #6208: memory leak in grapheme_extract() 2012-05-23 13:25:45 +02:00
Gustavo André dos Santos Lopes
86ea921291 Fixed bug #62082
This was a buffer overflow in internal function
get_icu_disp_value_src_php().
2012-05-23 13:25:42 +02:00
Gustavo André dos Santos Lopes
07c0d714a5 Fixed bug #62081
Constructor of IntlDateFormatter would leak if called twice.

Made calling it more than once error out before starting
using resources.
2012-05-23 13:25:37 +02:00
Gustavo André dos Santos Lopes
51286bd8e5 Fixed bug #62070
Collator::getSortKey() was returning an unterminated string
due the length given to RETURN_STRINGL being off by one.
2012-05-23 13:25:32 +02:00
Felipe Pena
82740ef31e - Fixed build using g++ (which complains about jump that crosses initialization) 2012-05-20 16:17:17 -03:00
Gustavo André dos Santos Lopes
3a81f90ebc Added IntlCalendar::toDateTime() 2012-05-17 23:18:51 +02:00
Gustavo André dos Santos Lopes
49b1f58194 Fixed a couple of memory leaks 2012-05-17 23:17:00 +02:00
Gustavo André dos Santos Lopes
887744f6b4 Fixed bad DateTime construction state check 2012-05-17 18:16:54 +02:00
Gustavo André dos Santos Lopes
ec23c3e540 MessageFormatter accepts IntlCalendar arguments
Now MessageFormatter::format() accepts IntlCalendar objects to be used in
arguments of type Format::kDate.
2012-05-17 17:57:37 +02:00
Gustavo André dos Santos Lopes
e9351b89a9 Bug #58756: w.r.t MessageFormatter (partial fix)
I don't think the current ICU API allows this bug to be completely fixed.

Right now, the code cannot control the time zone used in date/time formats
that appear inside complex subformats. See the comment inside
umsg_set_timezone().
2012-05-17 17:57:01 +02:00
Gustavo André dos Santos Lopes
30bf2fbb9d Handle bogus string in intl_charFromString(). 2012-05-17 17:23:53 +02:00
Gustavo André dos Santos Lopes
81d8f4079c Whitespace. 2012-05-17 17:23:52 +02:00
Gustavo André dos Santos Lopes
4cfd9995da Added IntlTimeZone::fromDateTimeZone() and ::toDateTimeZone.
IntlTimeZone::fromDateTimeZone(DateTimeZone $dtz) converts from an
ext/date TimeZone to an IntlTimeZone. The conversion is done by feeding
the time zone name (essentially what would be given by
DateTimeZone::getName()) to ICU's TimeZone::createTimeZone except if it's
an offset time zone. In that case, the offset is read from the ext/date
time zone object structure and an appopriate id (of the form
GMT<+|-><HH:MM>) is given to ICU's TimeZone::createTimeZone. Not all
ext/date time zones are recognized for ICU. For instance, WEST is not.
Note that these kind of abbreviations, as far as I can tell, can only be
created via ext/date DateTime, not directly through DateTimeZone's
constructor.

For IntlTimeZone::toDateTimeZone(), the behavior is symmetrical.
We instantiate a DateTimeZone and then call its constructor if we don't
have an offset time zone, otherwise we mess with its structure. If the
timezone is not valid for ext/date, then we allow the exception of
DateTimeZone constructor to propagate.
2012-05-17 17:23:51 +02:00