archive/ruby - Eplg Git: Free And Private Git Hosting

mirror of https://github.com/ruby/ruby.git synced 2025-08-23 13:04:13 +02:00

Author	SHA1	Message	Date
Nobuyoshi Nakada	b49cd84311	Remove `REG_LITERAL` flag All `Regexp` literals are frozen now.	2023-02-09 19:21:24 +09:00
Jeremy Evans	eccfc978fd	Fix parsing of regexps that toggle extended mode on/off inside regexp This was broken in `ec3542229b`. That commit didn't handle cases where extended mode was turned on/off inside the regexp. There are two ways to turn extended mode on/off: ``` /(?-x:#y)#z /x =~ '#y' /(?-x)#y(?x)#z /x =~ '#y' ``` These can be nested inside the same regexp: ``` /(?-x:(?x)#x (?-x)#y)#z /x =~ '#y' ``` As you can probably imagine, this makes handling these regexps somewhat complex. Due to the nesting inside portions of regexps, the unassign_nonascii function needs to be recursive. In recursive mode, it needs to track both opening and closing parentheses, similar to how it already tracked opening and closing brackets for character classes. When scanning the regexp and coming to `(?` not followed by `#`, scan for options, and use `x` and `i` to determine whether to turn on or off extended mode. For `:`, indicting only the current regexp section should have the extended mode switched, recurse with the extended mode set or unset. For `)`, indicating the remainder of the regexp (or current regexp portion if already recursing) should turn extended mode on or off, just change the extended mode flag and keep scanning. While testing this, I noticed that `a`, `d`, and `u` are accepted as options, in addition to `i`, `m`, and `x`, but I can't see where those options are documented. I'm not sure whether or not handling `a`, `d`, and `u` as options is a bug. Fixes [Bug #19379]	2023-01-30 08:51:12 -08:00
Burdette Lamar	30bd2a32fa	[DOC] Correction to RDoc for Regexp.new (#7130 ) Correction to RDoc for Regexp.new	2023-01-16 11:02:23 -06:00
Jeremy Evans	7e8fa06022	Always issue deprecation warning when calling Regexp.new with 3rd positional argument Previously, only certain values of the 3rd argument triggered a deprecation warning. First step for fix for bug #18797. Support for the 3rd argument will be removed after the release of Ruby 3.2. Fix minor fallout discovered by the tests. Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>	2022-12-22 11:50:26 -08:00
Nobuyoshi Nakada	e61e4ae60b	Refactor `reg_extract_args` to return regexp if given	2022-12-22 19:27:27 +09:00
Nobuyoshi Nakada	454c00723a	Share argument parsing in `Regexp#initialize` and `Regexp.linear_time?`	2022-12-22 15:51:00 +09:00
卜部昌平	34d43ed9f5	typo in doc [ci skip]	2022-12-19 11:20:55 +09:00
卜部昌平	47a6e7b518	Note about Regexp.linera_time? [ci skip]	2022-12-19 11:05:55 +09:00
TSUYUSATO Kitsune	fbedadb61f	Add `Regexp.linear_time?` (#6901 )	2022-12-14 12:57:14 +09:00
S-H-GAMELINKS	1a64d45c67	Introduce encoding check macro	2022-12-02 01:31:27 +09:00
Yusuke Endoh	ab4c7077cc	Prevent segfault in String#scan with ObjectSpace.each_object Calling `String#scan` without a block creates an incomplete MatchData object whose `RMATCH(match)->str` is Qfalse. Usually this object is not leaked, but it was possible to pull it by using ObjectSpace.each_object. This change hides the internal MatchData object by using rb_obj_hide. Fixes [Bug #19159]	2022-12-01 02:38:51 +09:00
S-H-GAMELINKS	1f4f6c9832	Using UNDEF_P macro	2022-11-16 18:58:33 +09:00
Nobuyoshi Nakada	001606097b	Suppress false warning by a bug of gcc GCC [Bug 99578] seems triggered by calling `rb_reg_last_match` before `match_check(match)`, probably by `NIL_P(match)` in `rb_reg_nth_match`. [Bug 99578]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578	2022-11-08 16:13:30 +09:00
Yusuke Endoh	67ed70da61	Refactor timeout-setting code to a function	2022-10-24 18:21:30 +09:00
Yusuke Endoh	ef01482f64	Refactor timeout-related code in re.c a little	2022-10-24 18:13:26 +09:00
Yusuke Endoh	b51b22513f	Fix per-instance Regexp timeout (#6621 ) Fix per-instance Regexp timeout This makes it follow what was decided in [Bug #19055]: * `Regexp.new(str, timeout: nil)` should respect the global timeout * `Regexp.new(str, timeout: huge_val)` should use the maximum value that can be represented in the internal representation * `Regexp.new(str, timeout: 0 or negative value)` should raise an error	2022-10-24 18:03:26 +09:00
S-H-GAMELINKS	c4089e6524	Fix argument & Remove enum	2022-10-23 17:38:59 +09:00
S-H-GAMELINKS	1e06ef1328	Introduce rb_memsearch_with_char_size function	2022-10-23 17:38:59 +09:00
git	2dd1a037de	* expand tabs. [ci skip] Tabs were expanded because the file did not have any tab indentation in unedited lines. Please update your editor config, and use misc/expand_tabs.rb in the pre-commit hook.	2022-10-10 13:22:15 +09:00
Nobuyoshi Nakada	0a98dd1cff	Should use dedecated function `Check_Type`	2022-10-10 13:21:57 +09:00
Vladimir Dementyev	4954c9fc0f	Add MatchData#deconstruct/deconstruct_keys	2022-10-10 12:41:13 +09:00
Nobuyoshi Nakada	c53667691a	[DOC] `offset` argument of Regexp#match	2022-08-18 23:25:05 +09:00
Aaron Patterson	e4e054e3ce	Speed up setting the backref match object This patch speeds up setting the backref match object by avoiding some memcopies. Take the following code for example: ```ruby "hello world" =~ /hello/ p $~ ``` When the RE matches the string, we have to set the Match object in the backref global. So we would allocate a match object[^1] and use `rb_reg_region_copy`[^2] to make a deep copy of the stack allocated `re_registers` struct[^3] in to the newly created Ruby object. This could possibly trigger GC[^4], and would allocate new memory. This patch makes a shallow copy of the `re_registers` struct on to the Match object allowing the match object to manage the `re_registers` pointer and also avoiding some calls to `xmalloc` and some manual memcopy. Benchmark looks like this: ```ruby require "benchmark/ips" def test_re thing thing =~ /hello/ end Benchmark.ips do \|x\| x.report("re hit") do test_re "hello world" end x.report("re miss") do test_re "world" end end ``` Before this patch: ``` $ ruby -v test.rb ruby 3.2.0dev (2022-07-27T22:29:00Z master `4ad69899b7`) [arm64-darwin21] Ignoring bcrypt-3.1.16 because its extensions are not built. Try: gem pristine bcrypt --version 3.1.16 Warming up -------------------------------------- re hit 345.401k i/100ms re miss 673.584k i/100ms Calculating ------------------------------------- re hit 3.452M (± 0.5%) i/s - 17.270M in 5.002535s re miss 6.736M (± 0.4%) i/s - 34.353M in 5.099593s ``` After this patch: ``` $ ./ruby -v test.rb ruby 3.2.0dev (2022-08-01T21:24:12Z less-memcpy 0ff2a56606) [arm64-darwin21] Warming up -------------------------------------- re hit 419.578k i/100ms re miss 673.251k i/100ms Calculating ------------------------------------- re hit 4.201M (± 0.7%) i/s - 21.398M in 5.093593s re miss 6.716M (± 0.4%) i/s - 33.663M in 5.012756s ``` Matches get faster and misses maintain the same speed [^1]: `24204d54ab/re.c (L1737)` [^2]: `24204d54ab/re.c (L1738)` [^3]: `24204d54ab/re.c (L1686)` [^4]: `24204d54ab/re.c (L981)`	2022-08-02 09:04:04 -07:00
Takashi Kokubun	5b21e94beb	Expand tabs [ci skip] [Misc #18891]	2022-07-21 09:42:04 -07:00
Kazuhiro NISHIYAMA	846a6bb60f	[DOC] Fix a typo [ci skip]	2022-06-26 14:17:14 +09:00
Jeremy Evans	596f4b0d3a	Document that Regexp#source does not retain lexer escapes Related to [Feature #18838]	2022-06-20 15:56:28 -07:00
Nobuyoshi Nakada	4a6facc2d6	[Feature #18788 ] [DOC] String options to `Regexp.new` Co-Authored-By: Janosch Müller <janosch.mueller@betterplace.org>	2022-06-20 19:35:12 +09:00
Nobuyoshi Nakada	1e9939dae2	[Feature #18788 ] Support options as `String` to `Regexp.new` `Regexp.new` now supports passing the regexp flags not only as an `Integer`, but also as a `String. Unknown flags raise errors.	2022-06-20 19:35:12 +09:00
Nobuyoshi Nakada	ab2a43265c	Warn suspicious flag to `Regexp.new` Now second argument should be `true`, `false`, `nil` or Integer. This flag is confused with third argument some times.	2022-06-20 19:35:12 +09:00
Nobuyoshi Nakada	7f8a915715	[DOC] Refine Regexp.new argument descriptions	2022-06-20 18:39:50 +09:00
Nobuyoshi Nakada	914c26eab3	[DOC] Regexp timeout is float or nil	2022-06-20 17:47:44 +09:00
Nobuyoshi Nakada	cd3a5cd0e3	[DOC] Fixed omissions in Regexp.new arguments	2022-06-20 09:26:11 +09:00
Jeremy Evans	ec3542229b	Ignore invalid escapes in regexp comments Invalid escapes are handled at multiple levels. The first level is in parse.y, so skip invalid unicode escape checks for regexps in parse.y. Make rb_reg_preprocess and unescape_nonascii accept the regexp options. In unescape_nonascii, if the regexp is an extended regexp, when "#" is encountered, ignore all characters until the end of line or end of regexp. Unfortunately, in extended regexps, you can use "#" as a non-comment character inside a character class, so also parse "[" and "]" specially for extended regexps, and only skip comments if "#" is not inside a character class. Handle nested character classes as well. This issue doesn't just affect extended regexps, it also affects "(#?" comments inside all regexps. So for those comments, scan until trailing ")" and ignore content inside. I'm not sure if there are other corner cases not handled. A better fix would be to redesign the regexp parser so that it unescaped during parsing instead of before parsing, so you already know the current parsing state. Fixes [Bug #18294] Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>	2022-06-06 13:50:03 -07:00
Burdette Lamar	b41de3a1e8	[DOC] Enhanced RDoc for MatchData (#5822 ) Treats: #to_s #named_captures #string #inspect #hash #==	2022-04-18 18:19:10 -05:00
Burdette Lamar	6db3f7c405	Enhanced RDoc for MatchData (#5821 ) Treats: #[] #values_at	2022-04-18 15:52:07 -05:00
Burdette Lamar	86e23529ad	Enhanced RDoc for MatchData (#5820 ) Treats: #pre_match #post_match #to_a #captures	2022-04-18 14:34:40 -05:00
Burdette Lamar	b074bc3d61	[DOC] Enhanced RDoc for MatchData (#5819 ) Treats: #begin #end #match #match_length	2022-04-18 13:02:35 -05:00
Burdette Lamar	9d1dd7a9ed	[DOC] Enhanced RDoc for MatchData (#5818 ) Treats: #regexp #names #size #offset	2022-04-18 11:31:30 -05:00
Burdette Lamar	51ea67698e	[DOC] Enhanced RDoc for Regexp (#5815 ) Treats: ::new ::escape ::try_convert ::union ::last_match	2022-04-18 10:45:29 -05:00
Burdette Lamar	2b4b513ef0	[DOC] Enhanced RDoc for Regexp (#5812 ) Treats: #fixed_encoding? #hash #== #=~ #match #match? Also, in regexp.rdoc: Changes heading from 'Special Global Variables' to 'Regexp Global Variables'. Add tiny section 'Regexp Interpolation'.	2022-04-16 15:20:03 -05:00
Burdette Lamar	e021754db0	[DOC] Enhanced RDoc for Regexp (#5807 ) Treats: #source #inspect #to_s #casefold? #options #names #named_captures	2022-04-15 13:31:15 -05:00
Nobuyoshi Nakada	d8189ed23f	Return only captured range in `MatchData` [Bug #18670 ]	2022-03-31 18:01:15 +09:00
Yusuke Endoh	c499a4c28a	re.c: stop a wrong warning of "flags ignored" on Regexp.new(//) [Bug #18669]	2022-03-31 10:07:09 +09:00
Yusuke Endoh	5df2589b64	internal/ractor.h: Added Currently it has only one function prototype.	2022-03-30 16:50:46 +09:00
Yusuke Endoh	2ade40276b	re.c: raise Regexp::TimeoutError instead of RuntimeError	2022-03-30 16:50:46 +09:00
Yusuke Endoh	ce87bb8bd6	re.c: Add `timeout` keyword for Regexp.new and Regexp#timeout	2022-03-30 16:50:46 +09:00
Yusuke Endoh	ffc3b37f96	re.c: Add Regexp.timeout= and Regexp.timeout [Feature #17837]	2022-03-30 16:50:46 +09:00
Shugo Maeda	c8817d6a3e	Add String#byteindex, String#byterindex, and MatchData#byteoffset (#5518 ) * Add String#byteindex, String#byterindex, and MatchData#byteoffset [Feature #13110] Co-authored-by: NARUSE, Yui <naruse@airemix.jp>	2022-02-19 19:10:00 +09:00
Shugo Maeda	cda5aee74e	LONG2NUM() should be used for rmatch_offset::{beg,end} https://github.com/ruby/ruby/pull/5518#discussion_r809645406	2022-02-18 22:13:45 +09:00
Nobuyoshi Nakada	16fdc1ff46	[DOC] Fix broken links to literals.rdoc	2022-02-08 01:27:52 +09:00

1 2 3 4 5 ...

658 commits