Commit graph

56 commits

Author SHA1 Message Date
Hiroshi SHIBATA
267d8a04b3
[ruby/strscan] Support Ractor#value
(https://github.com/ruby/strscan/pull/157)

This is same as https://github.com/ruby/stringio/pull/134

---------

141f9cf9b6

Co-authored-by: Koichi Sasada <ko1@atdot.net>
2025-06-03 18:13:15 +09:00
Koichi Sasada
ef2bb61018 Ractor::Port
* Added `Ractor::Port`
  * `Ractor::Port#receive` (support multi-threads)
  * `Rcator::Port#close`
  * `Ractor::Port#closed?`
* Added some methods
  * `Ractor#join`
  * `Ractor#value`
  * `Ractor#monitor`
  * `Ractor#unmonitor`
* Removed some methods
  * `Ractor#take`
  * `Ractor.yield`
* Change the spec
  * `Racotr.select`

You can wait for multiple sequences of messages with `Ractor::Port`.

```ruby
ports = 3.times.map{ Ractor::Port.new }
ports.map.with_index do |port, ri|
  Ractor.new port,ri do |port, ri|
    3.times{|i| port << "r#{ri}-#{i}"}
  end
end

p ports.each{|port| pp 3.times.map{port.receive}}

```

In this example, we use 3 ports, and 3 Ractors send messages to them respectively.
We can receive a series of messages from each port.

You can use `Ractor#value` to get the last value of a Ractor's block:

```ruby
result = Ractor.new do
  heavy_task()
end.value
```

You can wait for the termination of a Ractor with `Ractor#join` like this:

```ruby
Ractor.new do
  some_task()
end.join
```

`#value` and `#join` are similar to `Thread#value` and `Thread#join`.

To implement `#join`, `Ractor#monitor` (and `Ractor#unmonitor`) is introduced.

This commit changes `Ractor.select()` method.
It now only accepts ports or Ractors, and returns when a port receives a message or a Ractor terminates.

We removes `Ractor.yield` and `Ractor#take` because:
* `Ractor::Port` supports most of similar use cases in a simpler manner.
* Removing them significantly simplifies the code.

We also change the internal thread scheduler code (thread_pthread.c):
* During barrier synchronization, we keep the `ractor_sched` lock to avoid deadlocks.
  This lock is released by `rb_ractor_sched_barrier_end()`
  which is called at the end of operations that require the barrier.
* fix potential deadlock issues by checking interrupts just before setting UBF.

https://bugs.ruby-lang.org/issues/21262
2025-05-31 04:01:33 +09:00
Charles Oliver Nutter
8685a81e6a
[ruby/strscan] jruby: Check if len++ walked off the end
(https://github.com/ruby/strscan/pull/153)

Fix https://github.com/ruby/strscan/pull/152

CRuby can walk off the end because there's always a null byte. In JRuby,
the byte array is often (usually?) the exact size of the string. So we
need to check if len++ walked off the end.

This code was ported from a version by @byroot in
https://github.com/ruby/strscan/pull/127 but I missed adding this check
due to a lack of tests. A test is included for both "-" and "+" parsing.

1abe4ca556
2025-05-08 18:03:04 +09:00
Charles Oliver Nutter
5a0306f9c1
[ruby/strscan] jruby: Pass end index to byteListToInum
(https://github.com/ruby/strscan/pull/150)

These parse methods take begin and end indices, not begin and length. A
test is included.

Fixes https://github.com/jruby/jruby/issues/8823

9690e39e73
2025-05-08 18:03:04 +09:00
Sutou Kouhei
af6d6b64ea [ruby/strscan] named_captures: fix incompatibility with
MatchData#named_captures
(https://github.com/ruby/strscan/pull/146)

Fix https://github.com/ruby/strscan/pull/145

`MatchData#named_captures` use the last matched value for each name.

Reported by Linus Sellberg. Thanks!!!

a6086ea322
2025-05-02 09:52:38 +09:00
Andrii Konchyn
ea8b0017b2 [ruby/strscan] Enable tests passing on TruffleRuby
(https://github.com/ruby/strscan/pull/144)

Changes:
- enabled tests passing on TruffleRuby
- removed `truffleruby` and keep only `truffleruby-head` in CI

4aadfc8408
2025-02-25 15:36:46 +09:00
NAITOH Jun
018943ba05 [ruby/strscan] Fix a bug that inconsistency of IndexError vs nil for
unknown capture group
(https://github.com/ruby/strscan/pull/143)

Fix https://github.com/ruby/strscan/pull/139

Reported by Benoit Daloze. Thanks!!!

bc8a0d2623
2025-02-25 15:36:46 +09:00
NAITOH Jun
36ab247e4d [ruby/strscan] Fix a bug that scanning methods that don't use Regexp
don't clear named capture groups
(https://github.com/ruby/strscan/pull/142)

Fix https://github.com/ruby/strscan/pull/135

b957443e20
2025-02-25 15:36:46 +09:00
Jean Boussier
bf6c106d54 [ruby/strscan] scan_integer(base: 16) ignore x suffix if not
followed by hexadecimal
(https://github.com/ruby/strscan/pull/141)

Fix: https://github.com/ruby/strscan/issues/140

`0x<EOF>`, `0xZZZ` should be parsed as `0` instead of not matching at
all.

c4e4795ed2
2025-02-21 11:31:36 +09:00
NAITOH Jun
eee9bd1aa4 [ruby/strscan] Fix a bug that scan_until behaves differently with
Regexp and String patterns
(https://github.com/ruby/strscan/pull/138)

Fix https://github.com/ruby/strscan/pull/131

e1cec2e726
2025-02-17 11:04:32 +09:00
Jean Boussier
51004c3641
[ruby/strscan] Fix a bug that scan_integer doesn't update matched
data
(https://github.com/ruby/strscan/pull/133)

Fix https://github.com/ruby/strscan/pull/130

Reported by Andrii Konchyn. Thanks!!!

4e5f17f87a
2025-02-14 16:13:26 +09:00
Sutou Kouhei
9a7f050eda [ruby/strscan] test: don't omit "(...)" for method calls that have at least one argument
dddae9c99a
2024-12-02 10:50:34 +09:00
Jean Boussier
79cc3d26ed StringScanner#scan_integer support base 16 integers (#116)
Followup: https://github.com/ruby/strscan/pull/115

`scan_integer` is now implemented in Ruby as to efficiently handle
keyword arguments without allocating a Hash. Given the goal of
`scan_integer` is to more effciently parse integers without having to
allocate an intermediary object, using `rb_scan_args` would defeat the
purpose.

Additionally, the C implementation now uses `rb_isdigit` and
`rb_isxdigit`, because on Windows `isdigit` is locale dependent.
2024-12-02 10:50:34 +09:00
Yusuke Endoh
5514485e13 [ruby/strscan] Prevent a warning "ambiguous first argument" during a
test
(https://github.com/ruby/strscan/pull/118)

20241128T153002Z.log.html.gz
```
/home/chkbuild/chkbuild/tmp/build/20241128T153002Z/ruby/test/strscan/test_stringscanner.rb:908: warning: ambiguous first argument; put parentheses or a space even after `-` operator
```

af3fd2f045
2024-12-02 10:50:34 +09:00
Jean Boussier
d5de1a5789 [ruby/strscan] Implement #scan_integer to efficiently parse Integer
(https://github.com/ruby/strscan/pull/115)

Fix: https://github.com/ruby/strscan/issues/113

This allows to directly parse an Integer from a String without needing
to first allocate a sub string.

Notes:

The implementation is limited by design, it's meant as a first step,
only the most straightforward, based 10 integers are supported.

6a3c74b4c8
2024-11-27 09:24:07 +09:00
NAITOH Jun
e61bb75a86 [ruby/strscan] [JRuby] Optimize scan(): Remove duplicate `if
(restLen() < patternsize()) return context.nil;` checks in
`!headonly`.
(https://github.com/ruby/strscan/pull/110)

- before: #109

## Why?

d31274f41b/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java (L371-L373)

This means the following :

`if (str.size() - curr < pattern.size()) return context.nil;`

A similar check is made within `StringSupport#index()` within
`!headonly`.

be7815ec02/core/src/main/java/org/jruby/util/StringSupport.java (L1706-L1720)

```Java
    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset < otherLen) return -1;
```

- source = `strBL`
- other = `patternBL`
- offset = `strBeg + curr`

This means the following :
`if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return
-1;`

Both checks are the same.

## Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower
```

843e931d13
2024-10-26 18:44:15 +09:00
NAITOH Jun
d81b0588bb
[ruby/strscan] Accept String as a pattern at non head
(https://github.com/ruby/strscan/pull/106)

It supports non-head match cases such as StringScanner#scan_until.

If we use a String as a pattern, we can improve match performance.
Here is a result of the including benchmark.

## CRuby

It shows String as a pattern is 1.18x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     9.403M i/s -      9.548M times in 1.015459s (106.35ns/i)
          regexp_var     9.162M i/s -      9.248M times in 1.009479s (109.15ns/i)
              string     8.966M i/s -      9.274M times in 1.034343s (111.54ns/i)
          string_var    11.051M i/s -     11.190M times in 1.012538s (90.49ns/i)
Calculating -------------------------------------
              regexp    10.319M i/s -     28.209M times in 2.733707s (96.91ns/i)
          regexp_var    10.032M i/s -     27.485M times in 2.739807s (99.68ns/i)
              string     9.681M i/s -     26.897M times in 2.778397s (103.30ns/i)
          string_var    12.162M i/s -     33.154M times in 2.726046s (82.22ns/i)

Comparison:
          string_var:  12161920.6 i/s
              regexp:  10318949.7 i/s - 1.18x  slower
          regexp_var:  10031617.6 i/s - 1.21x  slower
              string:   9680843.7 i/s - 1.26x  slower
```

## JRuby

It shows String as a pattern is 2.11x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.591M i/s -      7.544M times in 0.993780s (131.74ns/i)
          regexp_var     6.143M i/s -      6.125M times in 0.997038s (162.77ns/i)
              string    14.135M i/s -     14.079M times in 0.996067s (70.75ns/i)
          string_var    14.079M i/s -     14.057M times in 0.998420s (71.03ns/i)
Calculating -------------------------------------
              regexp     9.409M i/s -     22.773M times in 2.420268s (106.28ns/i)
          regexp_var    10.116M i/s -     18.430M times in 1.821820s (98.85ns/i)
              string    21.389M i/s -     42.404M times in 1.982519s (46.75ns/i)
          string_var    20.897M i/s -     42.237M times in 2.021187s (47.85ns/i)

Comparison:
              string:  21389191.1 i/s
          string_var:  20897327.5 i/s - 1.02x  slower
          regexp_var:  10116464.7 i/s - 2.11x  slower
              regexp:   9409222.3 i/s - 2.27x  slower
```

See:
be7815ec02/core/src/main/java/org/jruby/util/StringSupport.java (L1706-L1736)

---------

f9d96c446a

Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-09-17 15:12:25 +09:00
Andrii Konchyn
8fa6c36492 [ruby/strscan] Omit tests for #scan_byte and #peek_byte on
TruffleRuby temporary
(https://github.com/ruby/strscan/pull/91)

The methods were added in #89 but they aren't implemented in TruffleRuby
yet. So let's omit them for now to have CI green.

844d963b56
2024-03-27 12:17:01 +09:00
Aaron Patterson
164e464b04 [ruby/strscan] Add a method for peeking and reading bytes as
integers
(https://github.com/ruby/strscan/pull/89)

This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the
current byte, return it as an integer, and advance the cursor.
`peek_byte` will return the current byte as an integer without advancing
the cursor.

Currently `StringScanner#get_byte` returns a string, but I want to get
the current byte without allocating a string. I think this will help
with writing high performance lexers.

---------

873aba2e5d

Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-02-26 15:54:54 +09:00
Charles Oliver Nutter
39f2e37ff1
[ruby/strscan] Don't add begin to length for new string slice
(https://github.com/ruby/strscan/pull/87)

Fixes https://github.com/ruby/strscan/pull/86

c17b015c00
2024-02-08 14:43:56 +09:00
NAITOH Jun
91f3530580
[ruby/strscan] Add test to check encoding for empty string
(https://github.com/ruby/strscan/pull/80)

See: https://github.com/ruby/strscan/issues/78#issuecomment-1890849891

d0508518a9
2024-01-19 10:49:12 +09:00
NAITOH Jun
338eb0065b [ruby/strscan] StringScanner#captures: Return nil not "" for
unmached capture
(https://github.com/ruby/strscan/pull/72)

fix https://github.com/ruby/strscan/issues/70
If there is no substring matching the group (s[3]), the behavior is
different.

If there is no substring matching the group, the corresponding element
(s[3]) should be nil.

```
s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba...">
s.scan /(foo)(bar)(BAZ)?/  #=> "foobar"
s[0]           #=> "foobar"
s[1]           #=> "foo"
s[2]           #=> "bar"
s[3]           #=> nil
s.captures #=> ["foo", "bar", ""]
s.captures.compact #=> ["foo", "bar", ""]
```

```
s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba...">
s.scan /(foo)(bar)(BAZ)?/  #=> "foobar"
s[0]           #=> "foobar"
s[1]           #=> "foo"
s[2]           #=> "bar"
s[3]           #=> nil
s.captures #=> ["foo", "bar", nil]
s.captures.compact #=> ["foo", "bar"]
```

https://docs.ruby-lang.org/ja/latest/method/MatchData/i/captures.html
```
/(foo)(bar)(BAZ)?/ =~ "foobarbaz" #=> 0
$~.to_a        #=> ["foobar", "foo", "bar", nil]
$~.captures #=> ["foo", "bar", nil]
$~.captures.compact #=> ["foo", "bar"]
```

* StringScanner#captures is not yet documented.
https://docs.ruby-lang.org/ja/latest/class/StringScanner.html

1fbfdd3c6f
2024-01-14 22:27:24 +09:00
Peter Zhu
e27eab2f85 [ruby/strscan] Sync missed commit
Syncs commit ruby/strscan@76b377a5d8.
2023-07-27 09:42:42 -04:00
Charles Oliver Nutter
4c7726516c [ruby/strscan] Mask out this test on JRuby/Windows
See https://github.com/jruby/jruby/issues/7644 for the root issue,
which will require fixes to JRuby's regular expression engine,
JOni.

29a65abff2
2023-02-21 19:31:39 +09:00
Sutou Kouhei
76a4cdfb02 [ruby/strscan] test: Run test more with fixed anchor mode
(https://github.com/ruby/strscan/pull/60)

fix https://github.com/ruby/strscan/pull/56
2023-02-21 19:31:38 +09:00
OKURA Masafumi
260bc7cdfa [ruby/strscan] Add test case to test_string
(https://github.com/ruby/strscan/pull/58)

`string` returns the original string after `scan` is called. Current
test doesn't check this behavior and now it's covered.
2023-02-21 19:31:38 +09:00
Hiroshi SHIBATA
4e31fea77d Merge strscan-3.0.5 2022-12-09 16:36:22 +09:00
Kenichi Kamiya
564ccd095a [ruby/strscan] Fix segmentation fault of StringScanner#charpos when String#byteslice returns non string value [Bug #17756] (#20)
92961cde2b
2021-05-06 16:20:38 +09:00
Hiroshi SHIBATA
822eb94563
Import from https://github.com/ruby/strscan/pull/19
* Use Gemfile instead of Gem::Specification#add_development_dependency.

* Use pend instead of skip for test-unit.
2021-05-06 16:18:58 +09:00
Kenta Murata
985f0af257
[strscan] Make strscan Ractor safe (#17)
* Make strscan Ractor safe

* Add test-unit in the development dependencies

3c93c2bebe
2020-12-18 14:25:41 +09:00
Jeremy Evans
ffd0820ab3 Deprecate taint/trust and related methods, and make the methods no-ops
This removes the related tests, and puts the related specs behind
version guards.  This affects all code in lib, including some
libraries that may want to support older versions of Ruby.
2019-11-18 01:00:25 +02:00
Sutou Kouhei
95c420c4a6
Import StringScanner 1.0.3 (#2553) 2019-10-14 12:40:50 +09:00
nobu
256c88861b strscan.c: add MatchData-like methods
* ext/strscan/strscan.c: added `size`, `captures` and `values_at`
  to StringScanner, shorthands of accessing the matched data.
  based on the patch by apeiros (Stefan Rusterholz) at
  [ruby-core:20412].  [Feature #836]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60929 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-11-29 07:57:48 +00:00
nobu
90311f37e4 strscan.c: fix segfault in aref
* ext/strscan/strscan.c (strscan_aref): fix segfault after
  get_byte or getch which do not apply regexp.
  [ruby-core:82116] [Bug #13759]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59384 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-07-21 13:30:46 +00:00
kazu
a4fde3b60c {ext,test}/strscan: Specify frozen_string_literal: true.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57551 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-02-06 13:23:39 +00:00
naruse
3e92b635fb Add frozen_string_literal: false for all files
When you change this to true, you may need to add more tests.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53141 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2015-12-16 05:07:31 +00:00
nobu
4713ace44c strscan.c: encoding in messages
* ext/strscan/strscan.c (strscan_aref): preserve argument encoding
  in error messages.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-08-03 01:56:31 +00:00
usa
5543a55b52 * test: get rid of warnings.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@45313 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-03-11 04:22:34 +00:00
naruse
c43dd625b6 * ext/strscan/strscan.c (strscan_aref): raise error if given
name reference is not found.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2013-05-24 07:32:45 +00:00
naruse
ecd5bbe82a * ext/strscan/strscan.c (strscan_aref): support named captures.
patched by Konstantin Haase [ruby-core:54664] [Feature #8343]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40881 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2013-05-21 13:48:57 +00:00
ryan
0700a9113f Added #charpos for multibyte string position.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@37916 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-11-28 00:17:33 +00:00
akr
47b8a0e7e4 avoid method redefinition.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26663 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2010-02-14 03:09:53 +00:00
nobu
2ef382231f * ext/strscan/strscan.c (strscan_set_string): set string should not be
dupped or frozen, because freezing it causes #concat method failure,
  and unnecessary to dup without freezing.  a patch from Aaron
  Patterson at [ruby-core:25145].


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24679 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-08-26 23:16:40 +00:00
matz
d121a3fb79 * ext/strscan/strscan.c (Init_strscan): remove obsolete
matchedsize method, use matched_size instead.  [ruby-dev:38591]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@23721 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2009-06-17 04:57:11 +00:00
nobu
00b4a3f9c4 * test: assert_raises has been deprecated since a long time ago.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@19536 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-09-24 17:44:39 +00:00
mame
437af4f46f * test/stringio/test_stringio.rb: add tests to achieve over 95% test
coverage of stringio.

* test/strscan/test_stringscanner.rb: ditto for strscan.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16847 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-06-05 14:33:01 +00:00
matz
ab24f2b077 * re.c (rb_reg_prepare_re): made non static with small refactoring.
* ext/strscan/strscan.c (strscan_do_scan): should adjust encoding
  before regex searching.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@16387 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-05-12 06:09:53 +00:00
akr
b8a9eb304d add a test.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-28 16:13:43 +00:00
akr
1c0416e6ee * ext/strscan/strscan.c (str_new): new function for allocate an string
with encoding propagation.
  (extract_range): use str_new.
  (extract_beg_len): ditto.
  (strscan_peek): ditto.
  (strscan_rest): ditto.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14772 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-28 14:55:43 +00:00
matz
326659c0bf * test/socket/test_socket.rb: update not to use 1.8 assignment to
external local variable in the block parameters.  [ruby-dev:32251]

* test/strscan/test_stringscanner.rb: avoid $KCODE, and use
  String#force_encoding().  [ruby-dev:32251]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13922 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-14 07:03:39 +00:00