Commit graph

649 commits

Author SHA1 Message Date
mame
7eb625425c * re.c: fix SEGV by Regexp.allocate.names, Match.allocate.names, etc.
* test/ruby/test_regexp.rb: add tests for above.

* io.c: fix SEGV by IO.allocate.print, etc.

* test/ruby/test_io.rb: add tests for above.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16757 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-06-02 12:45:42 +00:00
nobu
075530a685 * suppress warnings with -Wwrite-string.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16716 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-05-31 09:28:20 +00:00
matz
44cd8e457b * regparse.c (PINC): use optimized enclen() instead of
ONIGENC_MBC_ENC_LEN().

* regparse.c (PFETCH): ditto.

* regparse.c (PFETCH): small optimization.

* regexec.c (slow_search): single byte encoding optimization.

* regenc.h (enclen): avoid calling function when encoding's
  min_len == max_len.

* re.c (rb_reg_regsub): rb_enc_ascget() optimization for single
  byte encoding.

* re.c (rb_reg_search): avoid allocating new re_registers if we
  already have MatchData.

* re.c (match_init_copy): avoid unnecessary onig_region_free()
  before onig_region_copy. 

* encoding.c (rb_enc_get_index): remove implicit enc_capable check
  each time.

* encoding.c (rb_enc_set_index): ditto.

* encoding.c (enc_compatible_p): small refactoring.

* include/ruby/encoding.h (rb_enc_dummy_p): inline
  rb_enc_dummy_p() and export related code.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16477 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-05-19 08:25:03 +00:00
matz
c39e8c6e85 * array.c (rb_ary_sort_bang): stop memory leak. [ruby-dev:34726]
* re.c (rb_reg_search): need to free allocated buffer in re_register.

* regexec.c (onig_region_new): more pedantic malloc check.

* regexec.c (onig_region_resize): ditto.

* regexec.c (STATE_CHECK_BUFF_INIT): ditto.

* regexec.c (onig_region_copy): use onig_region_resize.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16437 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-05-16 18:27:01 +00:00
matz
880a96c795 * re.c (rb_reg_prepare_enc): error condition was updated for non
ASCII compatible strings.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16423 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-05-15 10:45:58 +00:00
matz
ab24f2b077 * re.c (rb_reg_prepare_re): made non static with small refactoring.
* ext/strscan/strscan.c (strscan_do_scan): should adjust encoding
  before regex searching.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16387 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-05-12 06:09:53 +00:00
matz
f34a75657d * re.c (Init_Regexp): remove MatchData#select. [ruby-dev:34563]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16264 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-05-02 04:57:19 +00:00
nobu
cc88283bad * re.c (rb_reg_search): use local variable. a patch from wanabe
<s.wanabe AT gmail.com> in [ruby-dev:34537].  [ruby-dev:34492]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-04-30 08:47:23 +00:00
nobu
89c408704b * enumerator.c (enumerator_each, enumerator_with_index): suppress
warnings.

* pack.c (pack_unpack): ditto.

* process.c (rb_syswait): ditto.

* re.c (rb_reg_prepare_enc, rb_reg_prepare_re,
  rb_reg_adjust_startpos): ditto.

* regparse.c (onig_name_to_group_numbers): ditto.

* missing/vsnprintf.c (BSD_vfprintf): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16156 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-04-22 13:49:43 +00:00
matz
fee4ed204f * re.c (rb_reg_search): make search reentrant. [ruby-dev:34223]
* test/ruby/test_parse.rb (TestParse::test_global_variable):
  should preserve $& variable.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-04-14 15:47:51 +00:00
matz
1dcbd6921e * re.c (rb_reg_quote): should always copy the quoting string.
[ruby-core:16235]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15925 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-04-08 01:53:35 +00:00
naruse
3467a1754c * re.c (rb_memsearch_qs): wrong boundary condition.
* re.c (rb_memsearch_qs_utf8): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15903 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-04-04 14:26:19 +00:00
matz
2b8af7d624 * re.c (rb_memsearch_qs): wrong boundary condition. a patch from
wanabe <s.wanabe AT gmail.com> in [ruby-dev:34248].

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15902 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-04-04 05:13:06 +00:00
naruse
e58adeae0f * re.c (rb_memsearch_ss): simple shift search.
* re.c (rb_memsearch_qs): quick search.

* re.c (rb_memsearch_qs_utf8): quick search for UTF-8 string.

* re.c (rb_memsearch_qs_utf8_hash): hash functions for above.

* re.c (rb_memsearch): use above functions.

* string.c (rb_str_index): give enc to rb_memsearch.

* include/ruby/intern.h (rb_memsearch): move to encoding.h.

* include/ruby/encoding.h (rb_memsearch): move from intern.h.

* common.mk (PREP): add dependency.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15792 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-03-17 19:04:29 +00:00
akr
861219ce4a fix doc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15734 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-03-09 01:04:46 +00:00
matz
39787ea14d * numeric.c (fix_to_s): avoid rb_scan_args() when no argument
given. 
* bignum.c (rb_big_to_s): ditto.
* enum.c (enum_first): ditto.
* eval_jump.c (rb_f_catch): ditto.
* io.c (rb_obj_display): ditto.
* class.c (rb_obj_singleton_methods): ditto.
* object.c (rb_class_initialize): ditto.
* random.c (rb_f_srand): ditto.
* range.c (range_step): ditto.
* re.c (rb_reg_s_last_match): ditto.
* string.c (rb_str_to_i): ditto.
* string.c (rb_str_each_line): ditto.
* string.c (rb_str_chomp_bang): ditto.
* string.c (rb_str_sum): ditto.

* string.c (str_modifiable): declare inline.
* string.c (str_independent): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15691 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-03-05 05:22:17 +00:00
matz
bbc2f80a32 * re.c (rb_reg_regsub): remove too strict encoding check.
[ruby-dev:33966]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15673 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-03-03 08:22:18 +00:00
matz
daa622aed0 * time.c (time_strftime): format should be ascii compatible.
* parse.y (rb_intern3): non ASCII compatible symbols.

* re.c (rb_reg_regsub): add encoding check.

* string.c (rb_str_chomp_bang): ditto.

* test/ruby/test_utf16.rb (TestUTF16::test_chomp): raises exception.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15640 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-29 09:19:15 +00:00
akr
d77ddf33ae add tests for sub/gsub with hash.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15535 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-18 03:51:34 +00:00
akr
1783b7aacc typo fix.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15534 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-18 03:43:11 +00:00
akr
a74c11cd4a * re.c (re_warn): defined to restore warnings for /[a-c-e]/, etc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15532 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-18 02:52:10 +00:00
akr
583a4b1774 * re.c (rb_reg_regsub): don't repeat repl twice with
"X".sub!(/./, sprintf("\\%c", 255)).


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15527 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-17 15:35:09 +00:00
akr
b8fd2fabbe * re.c (rb_reg_prepare_re): add enable_warning parameter.
(rb_reg_adjust_startpos): disable warning by rb_reg_prepare_re.
  (rb_reg_search): follow rb_reg_prepare_re parameter change.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15524 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-17 12:54:17 +00:00
akr
0f4199fb56 * re.c (rb_reg_quote): return US-ASCII string consistently.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15515 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-17 02:00:05 +00:00
akr
71c5e48598 * include/ruby/re.h (struct rmatch_offset): new struct for character
offsets.
  (struct rmatch): new struct.
  (struct RMatch): reference struct rmatch.
  (RMATCH_REGS): new macro.

* re.c (match_alloc): initialize struct rmatch.
  (pair_byte_cmp): new function.
  (update_char_offset): update character offsets.
  (match_init_copy): copy regexp and character offsets.
  (match_sublen): removed.
  (match_offset): use update_char_offset.
  (match_begin): ditto.
  (match_end): ditto.
  (rb_reg_search): make character offset updated flag false.
  (match_size): use RMATCH_REGS.
  (match_backref_number): ditto.
  (rb_reg_nth_defined): ditto.
  (rb_reg_nth_match): ditto.
  (rb_reg_match_pre): ditto.
  (rb_reg_match_post): ditto.
  (rb_reg_match_last): ditto.
  (match_array): ditto.
  (match_aref): ditto.
  (match_values_at): ditto.
  (match_inspect): ditto.

* string.c (rb_str_subpat_set): use RMATCH_REGS.
  (rb_str_sub_bang): ditto.
  (str_gsub): ditto.
  (rb_str_split_m): ditto.
  (scan_once): ditto.

* gc.c (obj_free): free character offsets.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15513 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-16 20:08:35 +00:00
akr
60fa63b819 * re.c (match_inspect): avoid SEGV with MatchData.allocate.inspect.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15509 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-16 11:13:47 +00:00
nobu
17fb1248af * re.c (rb_reg_quote): set US-ACII for ASCII-only string.
[ruby-dev:33785]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15481 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-15 01:35:56 +00:00
akr
ec4756f633 * re.c (rb_reg_preprocess_dregexp): use non-preprocessed regexp source
for result.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15465 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-02-14 03:34:12 +00:00
akr
d5c8ad5359 * insns.def (toregexp): generate a regexp from strings instead of one
string.

* re.c (rb_reg_new_ary): defined for toregexp.  it concatenates
  strings after each string is preprocessed. 

* compile.c (compile_dstr_fragments): split from compile_dstr.
  (compile_dstr): call compile_dstr_fragments.
  (compile_dregx): defined for dynamic regexp.
  (iseq_compile_each): use compile_dregx for dynamic regexp.

  [ruby-dev:33400]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15311 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-29 08:03:51 +00:00
naruse
3c6969ec11 * string.c, parse.y, re.c: use rb_ascii8bit_encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15292 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-28 09:03:09 +00:00
akr
fc208c1bd5 * include/ruby/oniguruma.h: precise mbclen API redesigned to avoid
inline functions.
  (onigenc_mbclen_charfound): removed.
  (onigenc_mbclen_needmore): removed.
  (onigenc_mbclen_recover): removed.
  (ONIGENC_MBCLEN_CHARFOUND): removed.
  (ONIGENC_MBCLEN_CHARFOUND_P): defined.
  (ONIGENC_MBCLEN_CHARFOUND_LEN): defined.
  (ONIGENC_MBCLEN_INVALID): removed.
  (ONIGENC_MBCLEN_INVALID_P): defined.
  (ONIGENC_MBCLEN_NEEDMORE): removed.
  (ONIGENC_MBCLEN_NEEDMORE_P): defined.
  (ONIGENC_MBCLEN_NEEDMORE_LEN): defined.
  (ONIGENC_MBC_ENC_LEN): use onigenc_mbclen_approximate.

* regenc.c (onigenc_mbclen_approximate): defined.

* include/ruby/encoding.h (MBCLEN_CHARFOUND): removed.
  (MBCLEN_INVALID): removed.
  (MBCLEN_NEEDMORE): removed.
  (MBCLEN_CHARFOUND_P): defined.
  (MBCLEN_INVALID_P): defined.
  (MBCLEN_NEEDMORE_P): defined.
  (MBCLEN_CHARFOUND_LEN): defined.
  (MBCLEN_NEEDMORE_LEN): defined.

* encoding.c: use new API.

* re.c: ditto.

* string.c: ditto.

* parse.y: ditto.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15280 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-27 14:27:07 +00:00
naruse
f3fe101d55 * re.c (rb_reg_source): set encoding as regexp encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15265 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-27 07:26:51 +00:00
akr
b9c18bdcdd * re.c (rb_reg_preprocess): force fixed encoding when ASCII
incompatible source string.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15260 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-26 21:01:52 +00:00
akr
1e41069754 * include/ruby/intern.h (rb_str_buf_cat_ascii): declared.
* string.c (rb_str_buf_cat_ascii): defined.

* re.c (rb_reg_s_union): use rb_str_buf_cat_ascii to support ASCII
  incompatible encoding.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-25 07:35:27 +00:00
usa
b1257d4d20 * re.c (rb_reg_fixed_encoding_p): no need to treat ASCII-8BIT specially.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-24 09:15:03 +00:00
usa
fbe52683e6 * re.c (rb_reg_initialize): 7bit clean regexp should be US-ASCII.
[ruby-dev:33346]



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15212 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-24 07:56:12 +00:00
akr
3766eac339 * re.c (rb_reg_prepare_re): fix SEGV by
/a/ =~ "aa".force_encoding("utf-16be").


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15178 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-23 04:40:43 +00:00
usa
61fd7dbf6d * re.c (rb_char_to_option_kcode): Regexp switch `s' should mean
Windows-31J, as wells as `-Ks'.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15101 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-18 00:44:15 +00:00
nobu
a0029e3adc * re.c (rb_char_to_option_kcode): fixed typo.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-17 12:48:23 +00:00
matz
d9ff499bf3 * re.c (rb_char_to_option_kcode): use rb_enc_find_index() instead
of using fixed index value.

* enc/Makefile.in (encsrcdir): make US-ASCII built-in.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15047 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-14 13:49:29 +00:00
akr
a31e2da12c * re.c (rb_reg_prepare_re): initialize error message buffer.
(rb_reg_search): ditto.
  (rb_reg_check_preprocess): ditto.
  (rb_reg_new_str): ditto.
  (rb_enc_reg_new): ditto.
  (rb_reg_compile): ditto.
  (rb_reg_initialize_m): ditto.
  (rb_reg_s_union_m): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15034 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-14 04:51:10 +00:00
akr
238c59842c * re.c (rb_reg_preprocess): fix fixed_enc condition.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-07 04:55:26 +00:00
akr
063beac343 * encoding.c (rb_enc_internal_get_index): extracted from
rb_enc_get_index.
  (rb_enc_internal_set_index): extracted from rb_enc_associate_index

* include/ruby/encoding.h (ENCODING_SET): work over ENCODING_INLINE_MAX.
  (ENCODING_GET): ditto.
  (ENCODING_IS_ASCII8BIT): defined.
  (ENCODING_CODERANGE_SET): defined.

* re.c (rb_reg_fixed_encoding_p): use ENCODING_IS_ASCII8BIT.

* string.c (rb_enc_str_buf_cat): use ENCODING_IS_ASCII8BIT.

* parse.y (reg_fragment_setenc_gen): use ENCODING_IS_ASCII8BIT.

* marshal.c (has_ivars): use ENCODING_IS_ASCII8BIT.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14922 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-07 02:49:01 +00:00
akr
f38cc001a7 * re.c (rb_reg_initialize_str): forbid raw non ASCII character
for ASCII-8BIT regexp in non ASCII-8BIT script.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14911 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-06 12:15:48 +00:00
akr
8987b97ca9 * include/ruby/encoding.h (rb_enc_str_buf_cat): declared.
* string.c (coderange_scan): extracted from rb_enc_str_coderange.
  (rb_enc_str_coderange): use coderange_scan.
  (rb_str_shared_replace): copy encoding and coderange.
  (rb_enc_str_buf_cat): new function for linear complexity string
  accumulation with encoding.
  (rb_str_sub_bang): don't conflict substituted part and replacement.
  (str_gsub): use rb_enc_str_buf_cat.
  (rb_str_clear): clear coderange.

* re.c (rb_reg_regsub): use rb_enc_str_buf_cat.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14910 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-06 09:25:09 +00:00
akr
da42c102c1 * re.c (rb_reg_initialize_str): /\x80/n is not an error even if script
encoding is EUC-JP.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14899 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-05 16:39:38 +00:00
nobu
8638ee26e7 * include/ruby/intern.h, re.c (rb_reg_new): keep interface same as
1.8.  [ruby-core:14583]

* include/ruby/intern.h, re.c (rb_reg_new_str): renamed, and defines
  HAVE_RB_REG_NEW_STR macro to tell if it is available.

* include/ruby/encoding.h (rb_enc_reg_new): added.

* insns.def (toregexp), marshal.c (r_object0): use rb_reg_new_str().

* re.c (rb_reg_regcomp, rb_reg_s_union): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14884 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 16:30:33 +00:00
akr
f780cdec75 * re.c (rb_reg_prepare_re): check string encoding. Oniguruma doesn't
support invalid encoding.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14880 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 05:01:58 +00:00
akr
7d98c90ef2 unused variable removed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14879 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 03:13:53 +00:00
matz
22e7258275 * re.c (rb_reg_search): avoid inner loop for reverse search.
* regexec.c: unset USE_MATCH_RANGE_MUST_BE_INSIDE_OF_SPECIFIED_RANGE
  which is turned on since oniguruma 5.9.1.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14878 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 01:24:12 +00:00