Hiroshi SHIBATA
0f06626915
Bump up strscan version to 3.1.5.dev
2025-05-02 10:11:09 +09:00
Sutou Kouhei
af6d6b64ea
[ruby/strscan] named_captures: fix incompatibility with
...
MatchData#named_captures
(https://github.com/ruby/strscan/pull/146 )
Fix https://github.com/ruby/strscan/pull/145
`MatchData#named_captures` use the last matched value for each name.
Reported by Linus Sellberg. Thanks!!!
a6086ea322
2025-05-02 09:52:38 +09:00
Hiroshi SHIBATA
4634a0042e
Mark development version for unreleased gems
2025-04-22 11:27:24 +09:00
Sutou Kouhei
067fc410fc
[ruby/strscan] Bump version
...
8ff80150c4
2025-04-22 11:27:24 +09:00
Sutou Kouhei
ad8cb532d5
[ruby/strscan] Bump version
...
7b1eb1e4ed
2025-04-14 16:18:48 +09:00
Jean byroot Boussier
0db87b8943
[ruby/strscan] Allow parsing strings larger than 2GiB
...
(https://github.com/ruby/strscan/pull/147 )
For a reason unknown, even though `pos` is stored as a `long`, the
`#pos` and `#pos=` treat it as an `int`, which prevent seeking into
strings larger than 2GiB.
b76368416e
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
2025-04-14 16:18:47 +09:00
NAITOH Jun
018943ba05
[ruby/strscan] Fix a bug that inconsistency of IndexError vs nil for
...
unknown capture group
(https://github.com/ruby/strscan/pull/143 )
Fix https://github.com/ruby/strscan/pull/139
Reported by Benoit Daloze. Thanks!!!
bc8a0d2623
2025-02-25 15:36:46 +09:00
NAITOH Jun
36ab247e4d
[ruby/strscan] Fix a bug that scanning methods that don't use Regexp
...
don't clear named capture groups
(https://github.com/ruby/strscan/pull/142 )
Fix https://github.com/ruby/strscan/pull/135
b957443e20
2025-02-25 15:36:46 +09:00
Jean Boussier
bf6c106d54
[ruby/strscan] scan_integer(base: 16)
ignore x suffix if not
...
followed by hexadecimal
(https://github.com/ruby/strscan/pull/141 )
Fix: https://github.com/ruby/strscan/issues/140
`0x<EOF>`, `0xZZZ` should be parsed as `0` instead of not matching at
all.
c4e4795ed2
2025-02-21 11:31:36 +09:00
NAITOH Jun
eee9bd1aa4
[ruby/strscan] Fix a bug that scan_until behaves differently with
...
Regexp and String patterns
(https://github.com/ruby/strscan/pull/138 )
Fix https://github.com/ruby/strscan/pull/131
e1cec2e726
2025-02-17 11:04:32 +09:00
Hiroshi SHIBATA
b4ed6db096
Removed trailing spaces
2025-02-14 16:16:55 +09:00
Jean Boussier
51004c3641
[ruby/strscan] Fix a bug that scan_integer doesn't update matched
...
data
(https://github.com/ruby/strscan/pull/133 )
Fix https://github.com/ruby/strscan/pull/130
Reported by Andrii Konchyn. Thanks!!!
4e5f17f87a
2025-02-14 16:13:26 +09:00
Alexander Momchilov
41e24c2f3e
[ruby/strscan] [DOC] Add syntax highlighting to MarkDown code blocks
...
(https://github.com/ruby/strscan/pull/126 )
Split off from https://github.com/ruby/ruby/pull/12322
9bee37e0f5
2024-12-16 10:10:34 +09:00
Sutou Kouhei
219c2eee5a
[ruby/strscan] Bump version
...
fd140b8582
2024-12-16 10:10:34 +09:00
Hiroshi SHIBATA
78ca87f8a8
Lock released version of strscan-3.1.1
2024-12-12 16:14:25 +09:00
Hiroshi SHIBATA
9b6036667e
Removed trailing spaces
2024-12-02 10:50:34 +09:00
Jean Boussier
636d57bd1c
[ruby/strscan] Micro optimize encoding checks
...
(https://github.com/ruby/strscan/pull/117 )
Profiling shows a lot of time spent in various encoding check functions.
I'm working on optimizing them on the Ruby side, but if we assume most
strings are one of the simple 3 encodings, we can skip a lot of
overhead.
```ruby
require 'strscan'
require 'benchmark/ips'
source = 10_000.times.map { rand(9999999).to_s }.join(",").force_encoding(Encoding::UTF_8).freeze
def scan_to_i(source)
scanner = StringScanner.new(source)
while number = scanner.scan(/\d+/)
number.to_i
scanner.skip(",")
end
end
def scan_integer(source)
scanner = StringScanner.new(source)
while scanner.scan_integer
scanner.skip(",")
end
end
Benchmark.ips do |x|
x.report("scan.to_i") { scan_to_i(source) }
x.report("scan_integer") { scan_integer(source) }
x.compare!
end
```
Before:
```
ruby 3.3.4 (2024-07-09 revision be1089c8ec
) +YJIT [arm64-darwin23]
Warming up --------------------------------------
scan.to_i 93.000 i/100ms
scan_integer 232.000 i/100ms
Calculating -------------------------------------
scan.to_i 933.191 (± 0.2%) i/s (1.07 ms/i) - 4.743k in 5.082597s
scan_integer 2.326k (± 0.8%) i/s (429.99 μs/i) - 11.832k in 5.087974s
Comparison:
scan_integer: 2325.6 i/s
scan.to_i: 933.2 i/s - 2.49x slower
```
After:
```
ruby 3.3.4 (2024-07-09 revision be1089c8ec
) +YJIT [arm64-darwin23]
Warming up --------------------------------------
scan.to_i 96.000 i/100ms
scan_integer 274.000 i/100ms
Calculating -------------------------------------
scan.to_i 969.489 (± 0.2%) i/s (1.03 ms/i) - 4.896k in 5.050114s
scan_integer 2.756k (± 0.1%) i/s (362.88 μs/i) - 13.974k in 5.070837s
Comparison:
scan_integer: 2755.8 i/s
scan.to_i: 969.5 i/s - 2.84x slower
```
c02b1ce684
2024-12-02 10:50:34 +09:00
Jean Boussier
79cc3d26ed
StringScanner#scan_integer support base 16 integers ( #116 )
...
Followup: https://github.com/ruby/strscan/pull/115
`scan_integer` is now implemented in Ruby as to efficiently handle
keyword arguments without allocating a Hash. Given the goal of
`scan_integer` is to more effciently parse integers without having to
allocate an intermediary object, using `rb_scan_args` would defeat the
purpose.
Additionally, the C implementation now uses `rb_isdigit` and
`rb_isxdigit`, because on Windows `isdigit` is locale dependent.
2024-12-02 10:50:34 +09:00
Jean Boussier
d5de1a5789
[ruby/strscan] Implement #scan_integer to efficiently parse Integer
...
(https://github.com/ruby/strscan/pull/115 )
Fix: https://github.com/ruby/strscan/issues/113
This allows to directly parse an Integer from a String without needing
to first allocate a sub string.
Notes:
The implementation is limited by design, it's meant as a first step,
only the most straightforward, based 10 integers are supported.
6a3c74b4c8
2024-11-27 09:24:07 +09:00
NAITOH Jun
e73f35ddaf
[ruby/strscan] [CRuby] Optimize strscan_do_scan()
: Remove
...
unnecessary use of `rb_enc_get()`
(https://github.com/ruby/strscan/pull/108 )
- before: #106
## Why?
In `rb_strseq_index()`, the result of `rb_enc_check()` is used.
-
6c7209cd37/string.c (L4335-L4368)
> enc = rb_enc_check(str, sub);
> return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len,
offset, enc);
-
6c7209cd37/string.c (L4309-L4318)
```C
strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len,
const char *sub_ptr, long sub_len, long offset, rb_encoding *enc)
{
const char *search_start = str_ptr;
long pos, search_len = str_len - offset;
for (;;) {
const char *t;
pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc);
```
## Benchmark
It shows String as a pattern is 1.24x faster than Regexp as a pattern.
```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
regexp 9.225M i/s - 9.328M times in 1.011068s (108.40ns/i)
regexp_var 9.327M i/s - 9.413M times in 1.009214s (107.21ns/i)
string 9.200M i/s - 9.355M times in 1.016840s (108.70ns/i)
string_var 11.249M i/s - 11.255M times in 1.000578s (88.90ns/i)
Calculating -------------------------------------
regexp 9.565M i/s - 27.676M times in 2.893476s (104.55ns/i)
regexp_var 10.111M i/s - 27.982M times in 2.767496s (98.90ns/i)
string 10.060M i/s - 27.600M times in 2.743465s (99.40ns/i)
string_var 12.519M i/s - 33.746M times in 2.695615s (79.88ns/i)
Comparison:
string_var: 12518707.2 i/s
regexp_var: 10111089.6 i/s - 1.24x slower
string: 10060144.4 i/s - 1.24x slower
regexp: 9565124.4 i/s - 1.31x slower
```
ff2d7afa19
2024-10-26 18:44:15 +09:00
Nobuyoshi Nakada
d6046bccb7
[ruby/strscan] Use C90 as far as supporting 2.6 or earlier
...
(https://github.com/ruby/strscan/pull/101 )
d31274f41b
2024-10-26 18:44:15 +09:00
NAITOH Jun
d81b0588bb
[ruby/strscan] Accept String as a pattern at non head
...
(https://github.com/ruby/strscan/pull/106 )
It supports non-head match cases such as StringScanner#scan_until.
If we use a String as a pattern, we can improve match performance.
Here is a result of the including benchmark.
## CRuby
It shows String as a pattern is 1.18x faster than Regexp as a pattern.
```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
regexp 9.403M i/s - 9.548M times in 1.015459s (106.35ns/i)
regexp_var 9.162M i/s - 9.248M times in 1.009479s (109.15ns/i)
string 8.966M i/s - 9.274M times in 1.034343s (111.54ns/i)
string_var 11.051M i/s - 11.190M times in 1.012538s (90.49ns/i)
Calculating -------------------------------------
regexp 10.319M i/s - 28.209M times in 2.733707s (96.91ns/i)
regexp_var 10.032M i/s - 27.485M times in 2.739807s (99.68ns/i)
string 9.681M i/s - 26.897M times in 2.778397s (103.30ns/i)
string_var 12.162M i/s - 33.154M times in 2.726046s (82.22ns/i)
Comparison:
string_var: 12161920.6 i/s
regexp: 10318949.7 i/s - 1.18x slower
regexp_var: 10031617.6 i/s - 1.21x slower
string: 9680843.7 i/s - 1.26x slower
```
## JRuby
It shows String as a pattern is 2.11x faster than Regexp as a pattern.
```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
regexp 7.591M i/s - 7.544M times in 0.993780s (131.74ns/i)
regexp_var 6.143M i/s - 6.125M times in 0.997038s (162.77ns/i)
string 14.135M i/s - 14.079M times in 0.996067s (70.75ns/i)
string_var 14.079M i/s - 14.057M times in 0.998420s (71.03ns/i)
Calculating -------------------------------------
regexp 9.409M i/s - 22.773M times in 2.420268s (106.28ns/i)
regexp_var 10.116M i/s - 18.430M times in 1.821820s (98.85ns/i)
string 21.389M i/s - 42.404M times in 1.982519s (46.75ns/i)
string_var 20.897M i/s - 42.237M times in 2.021187s (47.85ns/i)
Comparison:
string: 21389191.1 i/s
string_var: 20897327.5 i/s - 1.02x slower
regexp_var: 10116464.7 i/s - 2.11x slower
regexp: 9409222.3 i/s - 2.27x slower
```
See:
be7815ec02/core/src/main/java/org/jruby/util/StringSupport.java (L1706-L1736)
---------
f9d96c446a
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-09-17 15:12:25 +09:00
Hiroshi SHIBATA
32f134bb85
Added pre-release suffix for development version of default gems
...
https://github.com/ruby/stringio/issues/81
2024-08-31 14:22:17 +09:00
Hiroshi SHIBATA
3eda59e975
Sync strscan HEAD again.
...
https://github.com/ruby/strscan/pull/99 split document with multi-byte
chars.
2024-06-04 12:40:08 +09:00
Hiroshi SHIBATA
78bfde5d9f
Revert "[ruby/strscan] Doc for StringScanner"
...
This reverts commit 974ed1408c
.
2024-05-30 21:13:10 +09:00
Hiroshi SHIBATA
d70b0da482
Revert "Fix reference path for strscan documentation"
...
This reverts commit 1fa93fb948
.
2024-05-30 21:13:01 +09:00
Hiroshi SHIBATA
1fa93fb948
Fix reference path for strscan documentation
2024-05-30 14:29:25 +09:00
Burdette Lamar
974ed1408c
[ruby/strscan] Doc for StringScanner
...
(https://github.com/ruby/strscan/pull/96 )
#peek_byte and #scan_byte not updated (not available in my repo --
sorry).
---------
0123da7352
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
2024-05-30 12:34:18 +09:00
卜部昌平
c844968b72
ruby tool/update-deps --fix
2024-04-27 21:55:28 +09:00
Aaron Patterson
164e464b04
[ruby/strscan] Add a method for peeking and reading bytes as
...
integers
(https://github.com/ruby/strscan/pull/89 )
This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the
current byte, return it as an integer, and advance the cursor.
`peek_byte` will return the current byte as an integer without advancing
the cursor.
Currently `StringScanner#get_byte` returns a string, but I want to get
the current byte without allocating a string. I think this will help
with writing high performance lexers.
---------
873aba2e5d
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-02-26 15:54:54 +09:00
Sutou Kouhei
ce2618c628
[ruby/strscan] Bump version
...
ba338b882c
2024-02-08 14:43:56 +09:00
Sutou Kouhei
5afae77ce9
[ruby/strscan] Bump version
...
842845af1f
2024-02-08 14:43:56 +09:00
Sutou Kouhei
ac636f5709
[ruby/strscan] Bump version
...
d6f97ec102
2024-01-19 10:49:12 +09:00
NAITOH Jun
338eb0065b
[ruby/strscan] StringScanner#captures: Return nil not "" for
...
unmached capture
(https://github.com/ruby/strscan/pull/72 )
fix https://github.com/ruby/strscan/issues/70
If there is no substring matching the group (s[3]), the behavior is
different.
If there is no substring matching the group, the corresponding element
(s[3]) should be nil.
```
s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba...">
s.scan /(foo)(bar)(BAZ)?/ #=> "foobar"
s[0] #=> "foobar"
s[1] #=> "foo"
s[2] #=> "bar"
s[3] #=> nil
s.captures #=> ["foo", "bar", ""]
s.captures.compact #=> ["foo", "bar", ""]
```
```
s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba...">
s.scan /(foo)(bar)(BAZ)?/ #=> "foobar"
s[0] #=> "foobar"
s[1] #=> "foo"
s[2] #=> "bar"
s[3] #=> nil
s.captures #=> ["foo", "bar", nil]
s.captures.compact #=> ["foo", "bar"]
```
https://docs.ruby-lang.org/ja/latest/method/MatchData/i/captures.html
```
/(foo)(bar)(BAZ)?/ =~ "foobarbaz" #=> 0
$~.to_a #=> ["foobar", "foo", "bar", nil]
$~.captures #=> ["foo", "bar", nil]
$~.captures.compact #=> ["foo", "bar"]
```
* StringScanner#captures is not yet documented.
https://docs.ruby-lang.org/ja/latest/class/StringScanner.html
1fbfdd3c6f
2024-01-14 22:27:24 +09:00
Hiroshi SHIBATA
f54369830f
Revert "Rollback to released version numbers of stringio and strscan"
...
This reverts commit 6a79e53823
.
2023-12-25 21:12:49 +09:00
Hiroshi SHIBATA
6a79e53823
Rollback to released version numbers of stringio and strscan
2023-12-16 12:00:59 +08:00
Sutou Kouhei
ce8301084f
[ruby/strscan] Bump version
...
1b3393be05
2023-11-08 09:26:58 +09:00
Peter Zhu
91e13a5207
[ruby/strscan] Fix indentation in strscan.c
...
[ci skip]
2023-07-28 10:12:52 -04:00
Peter Zhu
7193b404a1
Add function rb_reg_onig_match
...
rb_reg_onig_match performs preparation, error handling, and cleanup for
matching a regex against a string. This reduces repetitive code and
removes the need for StringScanner to access internal data of regex.
2023-07-27 13:33:40 -04:00
Peter Zhu
e27eab2f85
[ruby/strscan] Sync missed commit
...
Syncs commit ruby/strscan@76b377a5d8 .
2023-07-27 09:42:42 -04:00
Matt Valentine-House
5e4b80177e
Update the depend files
2023-02-28 09:09:00 -08:00
Matt Valentine-House
f38c6552f9
Remove intern/gc.h from Make deps
2023-02-27 10:11:56 -08:00
Sutou Kouhei
18e840ac60
[ruby/strscan] Bump version
...
681cde0f27
2023-02-21 19:31:36 +09:00
OKURA Masafumi
a44f5ab089
[ruby/strscan] Mention return value of rest?
in the doc
...
(https://github.com/ruby/strscan/pull/49 )
The doc of `rest?` was unclear about return value. This commit adds the
return value to the doc.
2023-02-21 19:31:35 +09:00
Nobuyoshi Nakada
899ea35035
Extract include/ruby/internal/attr/packed_struct.h
...
Split `PACKED_STRUCT` and `PACKED_STRUCT_UNALIGNED` macros into the
macros bellow:
* `RBIMPL_ATTR_PACKED_STRUCT_BEGIN`
* `RBIMPL_ATTR_PACKED_STRUCT_END`
* `RBIMPL_ATTR_PACKED_STRUCT_UNALIGNED_BEGIN`
* `RBIMPL_ATTR_PACKED_STRUCT_UNALIGNED_END`
2023-02-08 12:34:13 +09:00
Sutou Kouhei
79ad045214
[ruby/strscan] Bump version
...
3ada12613d
2022-12-26 15:09:21 +09:00
Hiroshi SHIBATA
4e31fea77d
Merge strscan-3.0.5
2022-12-09 16:36:22 +09:00
Peter Zhu
2d5ecd60a5
[Feature #18249 ] Update dependencies
2022-02-22 09:55:21 -05:00
Nobuyoshi Nakada
ac152b3cac
Update dependencies
2021-11-21 16:21:18 +09:00
Sutou Kouhei
c0c43276a1
[ruby/strscan] Bump version
...
If we use the same version as the default strscan gem in Ruby, "gem
install" doesn't extract .gem. It fails "gem install" because "gem
install" can't find ext/strscan/ to be built.
3ceafa6cdc
2021-10-24 05:57:48 +09:00