Commit graph

477 commits

Author SHA1 Message Date
Erim Icel
c914389ae8 Update string_casecmp.yml 2025-08-11 22:22:38 +09:00
Erim Icel
5e324ac11c Optimize str_casecmp length check using pointer end 2025-08-11 22:22:38 +09:00
Jean Boussier
f3206cc79b Struct: keep direct reference to IMEMO/fields when space allows
It's not rare for structs to have additional ivars, hence are one
of the most common, if not the most common type in the `gen_fields_tbl`.

This can cause Ractor contention, but even in single ractor mode
means having to do a hash lookup to access the ivars, and increase
GC work.

Instead, unless the struct is perfectly right sized, we can store
a reference to the associated IMEMO/fields object right after the
last struct member.

```
compare-ruby: ruby 3.5.0dev (2025-08-06T12:50:36Z struct-ivar-fields-2 9a30d141a1) +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-08-06T12:57:59Z struct-ivar-fields-2 2ff3ec237f) +PRISM [arm64-darwin24]
warming up.....

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|member_reader         |    590.317k|  579.246k|
|                      |       1.02x|         -|
|member_writer         |    543.963k|  527.104k|
|                      |       1.03x|         -|
|member_reader_method  |    213.540k|  213.004k|
|                      |       1.00x|         -|
|member_writer_method  |    192.657k|  191.491k|
|                      |       1.01x|         -|
|ivar_reader           |    403.993k|  569.915k|
|                      |           -|     1.41x|
```

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2025-08-06 17:07:49 +02:00
Jean Boussier
95235fd528 benchmark_driver: Stop using Ractor#take 2025-07-04 08:23:20 +02:00
Tim Smith
3cfd71e7e4 Fix minor typos in comments, specs, and docs
Just a bit of minor cleanup

Signed-off-by: Tim Smith <tsmith84@gmail.com>
2025-06-18 07:51:16 +09:00
Hartley McGuire
8120971932 Move more NilClass methods to ruby
```
$ make benchmark ITEM=nilclass COMPARE_RUBY="/opt/rubies/ruby-master/bin/ruby"
/opt/rubies/3.4.2/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
                    --executables="compare-ruby::/opt/rubies/ruby-master/bin/ruby -I.ext/common --disable-gem" \
                    --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
                    --output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'nilclass' -o -name '*nilclass*.yml' -o -name '*nilclass*.rb' | sort)
compare-ruby: ruby 3.5.0dev (2025-06-02T13:52:25Z master cbd49ecbbe) +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-06-02T22:47:21Z hm-ruby-nilclass 3e7f1f0466) +PRISM [arm64-darwin24]

|             |compare-ruby|built-ruby|
|:------------|-----------:|---------:|
|rationalize  |     24.056M|   53.908M|
|             |           -|     2.24x|
|to_c         |     23.652M|   82.781M|
|             |           -|     3.50x|
|to_i         |     89.526M|   84.388M|
|             |       1.06x|         -|
|to_f         |     84.746M|   96.899M|
|             |           -|     1.14x|
|to_r         |     25.107M|   83.472M|
|             |           -|     3.32x|
|splat        |     42.772M|   42.717M|
|             |       1.00x|         -|
```

This makes them much faster
2025-06-12 09:30:09 +02:00
John Hawthorn
e01e89f55c Avoid calling RCLASS_SUPER in rb_class_superclass 2025-05-23 10:22:24 -07:00
Samuel Williams
73c9d6ccaa
Allow IO#close to interrupt IO operations on fibers using fiber_interrupt hook. (#12839) 2025-05-23 14:55:05 +09:00
John Hawthorn
9b8c846bdf Add an additional test to module_eqq 2025-05-12 19:05:19 -07:00
John Hawthorn
b0502e8f90 Remove respond_to check from Class#bind_call 2025-05-12 14:10:29 -07:00
Jeremy Evans
e4f85bfc31 Implement Set as a core class
Set has been an autoloaded standard library since Ruby 3.2.
The standard library Set is less efficient than it could be, as it
uses Hash for storage, which stores unnecessary values for each key.

Implementation details:

* Core Set uses a modified version of `st_table`, named `set_table`.
  than `s/st_/set_/`, the main difference is that the stored records
  do not have values, making them 1/3 smaller. `st_table_entry` stores
  `hash`, `key`, and `record` (value), while `set_table_entry` only
  stores `hash` and `key`.  This results in large sets using ~33% less
  memory compared to stdlib Set.  For small sets, core Set uses 12% more
  memory (160 byte object slot and 64 malloc bytes, while stdlib set
  uses 40 for Set and 160 for Hash).  More memory is used because
  the set_table is embedded and 72 bytes in the object slot are
  currently wasted. Hopefully we can make this more efficient and have
  it stored in an 80 byte object slot in the future.

* All methods are implemented as cfuncs, except the pretty_print
  methods, which were moved to `lib/pp.rb` (which is where the
  pretty_print methods for other core classes are defined).  As is
  typical for core classes, internal calls call C functions and
  not Ruby methods.  For example, to check if something is a Set,
  `rb_obj_is_kind_of` is used, instead of calling `is_a?(Set)` on the
  related object.

* Almost all methods use the same algorithm that the pure-Ruby
  implementation used.  The exception is when calling `Set#divide` with a
  block with 2-arity.  The pure-Ruby method used tsort to implement this.
  I developed an algorithm that only allocates a single intermediate
  hash and does not need tsort.

* The `flatten_merge` protected method is no longer necessary, so it
  is not implemented (it could be).

* Similar to Hash/Array, subclasses of Set are no longer reflected in
  `inspect` output.

* RDoc from stdlib Set was moved to core Set, with minor updates.

This includes a comprehensive benchmark suite for all public Set
methods.  As you would expect, the native version is faster in the
vast majority of cases, and multiple times faster in many cases.
There are a few cases where it is significantly slower:

* Set.new with no arguments (~1.6x)
* Set#compare_by_identity for small sets (~1.3x)
* Set#clone for small sets (~1.5x)
* Set#dup for small sets (~1.7x)

These are slower as Set does not currently use the AR table
optimization that Hash does, so a new set_table is initialized for
each call.  I'm not sure it's worth the complexity to have an AR
table-like optimization for small sets (for hashes it makes sense,
as small hashes are used everywhere in Ruby).

The rbs and repl_type_completor bundled gems will need updates to
support core Set.  The pull request marks them as allowed failures.

This passes all set tests with no changes.  The following specs
needed modification:

* Modifying frozen set error message (changed for the better)
* `Set#divide` when passed a 2-arity block no longer yields the same
  object as both the first and second argument (this seems like an issue
  with the previous implementation).
* Set-like objects that override `is_a?` such that `is_a?(Set)` return
  `true` are no longer treated as Set instances.
* `Set.allocate.hash` is no longer the same as `nil.hash`
* `Set#join` no longer calls `Set#to_a` (it calls the underlying C
   function).
* `Set#flatten_merge` protected method is not implemented.

Previously, `set.rb` added a `SortedSet` autoload, which loads
`set/sorted_set.rb`.  This replaces the `Set` autoload in `prelude.rb`
with a `SortedSet` autoload, but I recommend removing it and
`set/sorted_set.rb`.

This moves `test/set/test_set.rb` to `test/ruby/test_set.rb`,
reflecting that switch to a core class.  This does not move the spec
files, as I'm not sure how they should be handled.

Internally, this uses the st_* types and functions as much as
possible, and only adds set_* types and functions as needed.
The underlying set_table implementation is stored in st.c, but
there is no public C-API for it, nor is there one planned, in
order to keep the ability to change the internals going forward.

For internal uses of st_table with Qtrue values, those can
probably be replaced with set_table.  To do that, include
internal/set_table.h.  To handle symbol visibility (rb_ prefix),
internal/set_table.h uses the same macro approach that
include/ruby/st.h uses.

The Set class (rb_cSet) and all methods are defined in set.c.
There isn't currently a C-API for the Set class, though C-API
functions can be added as needed going forward.

Implements [Feature #21216]

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Oliver Nutter <mrnoname1000@riseup.net>
2025-04-26 10:31:11 +09:00
John Hawthorn
3a29e835e6 Add benchmarks for fstring de-duplication 2025-04-18 13:03:54 +09:00
git
3bfcb013c0 * remove trailing spaces. [ci skip] 2025-04-18 00:10:03 +00:00
Takashi Kokubun
bcacf7c849 Add configuration for git commit
and test auto-style again by adding spaces to app_fib
2025-04-18 09:08:57 +09:00
Takashi Kokubun
2da80242a9 Actually test auto-style
My editor deleted trailing spaces which I meant to leave
2025-04-18 09:04:25 +09:00
Takashi Kokubun
d5f3549e71 Test auto-style 2025-04-18 09:02:30 +09:00
Jean Boussier
0606046c1a Lazily create objspace->id_to_obj_tbl
This inverse table is only useful if `ObjectSpace._id2ref` is used,
which is extremely rare. The only notable exception is the `drb` gem
and even then it has an option not to rely on `_id2ref`.

So if we assume this table will never be looked up, we can just
not maintain it, and if it turns out `_id2ref` is called, we
can lock the VM and re-build it.

```
compare-ruby: ruby 3.5.0dev (2025-04-10T09:44:40Z master 684cfa42d7) +YJIT +PRISM [arm64-darwin24]
built-ruby: ruby 3.5.0dev (2025-04-10T10:13:43Z lazy-id-to-obj d3aa9626cc) +YJIT +PRISM [arm64-darwin24]
warming up..

|           |compare-ruby|built-ruby|
|:----------|-----------:|---------:|
|baseline   |     26.364M|   25.974M|
|           |       1.01x|         -|
|object_id  |     10.293M|   14.202M|
|           |           -|     1.38x|
```
2025-04-15 07:57:39 +09:00
Jeremy Evans
67d1dd2ebd Avoid array allocation for *nil, by not calling nil.to_a
The following method call:

```ruby
a(*nil)
```

A method call such as `a(*nil)` previously allocated an array, because
it calls `nil.to_a`, but I have determined this array allocation is
unnecessary.  The instructions in this case are:

```
0000 putself                                                          (   1)[Li]
0001 putnil
0002 splatarray                             false
0004 opt_send_without_block                 <calldata!mid:a, argc:1, ARGS_SPLAT|FCALL>
0006 leave
```

The method call uses `ARGS_SPLAT` without `ARGS_SPLAT_MUT`, so the
returned array doesn't need to be mutable.  I believe all cases where
`splatarray false` are used allow the returned object to be frozen,
since the `false` means to not duplicate the array.  The optimization
in this case is to have `splatarray false` push a shared empty frozen
array, instead of calling `nil.to_a` to return a newly allocated array.

There is a slightly backwards incompatibility with this optimization,
in that `nil.to_a` is not called.  However, I believe the new behavior
of `*nil` not calling `nil.to_a` is more consistent with how `**nil`
does not call `nil.to_hash`.  Also, so much Ruby code would break if
`nil.to_a` returned something different from the empty hash, that it's
difficult to imagine anyone actually doing that in real code, though
we have a few tests/specs for that.

I think it would be bad for consistency if `*nil` called `nil.to_a`
in some cases and not others, so this changes other cases to not
call `nil.to_a`:

For `[*nil]`, this uses `splatarray true`, which now allocates a
new array for a `nil` argument without calling `nil.to_a`.

For `[1, *nil]`, this uses `concattoarray`, which now returns
the first array if the second array is `nil`.

This updates the allocation tests to check that the array allocations
are avoided where possible.

Implements [Feature #21047]
2025-03-27 11:17:40 -07:00
Jean Boussier
f32d5071b7 Elide string allocation when using String#gsub in MAP mode
If the provided Hash doesn't have a default proc, we know for
sure that we'll never call into user provided code, hence the
string we allocate to access the Hash can't possibly escape.

So we don't actually have to allocate it, we can use a fake_str,
AKA a stack allocated string.

```
compare-ruby: ruby 3.5.0dev (2025-02-10T13:47:44Z master 3fb455adab) +PRISM [arm64-darwin23]
built-ruby: ruby 3.5.0dev (2025-02-10T17:09:52Z opt-gsub-alloc ea5c28958f) +PRISM [arm64-darwin23]
warming up....

|                 |compare-ruby|built-ruby|
|:----------------|-----------:|---------:|
|escape           |      3.374k|    3.722k|
|                 |           -|     1.10x|
|escape_bin       |      5.469k|    6.587k|
|                 |           -|     1.20x|
|escape_utf8      |      3.465k|    3.734k|
|                 |           -|     1.08x|
|escape_utf8_bin  |      5.752k|    7.283k|
|                 |           -|     1.27x|
```
2025-02-12 10:23:50 +01:00
Alexander Momchilov
0ea5c13bc6
[DOC] Improve formatting in Markdown files (#12322)
* Fix WASM bullet/code indentation

* Use `console` code highlighting where appropriate

… which handles the prefix `$` correctly.

* Migrate feature proposal template to MarkDown

* Set language on code blocks
2024-12-12 17:49:45 -08:00
NAITOH Jun
a041a6c1b5 [ruby/strscan] Add scan and search benchmark
(https://github.com/ruby/strscan/pull/111)

# Why?
To improve the parsing process, I would like to add benchmarks for all
parsing processes.

## scan
- scan_full(regexp, false, true) == StringScanner#check
- scan_full(regexp, false, false) ==  StringScanner#match?

### CRuby

```
$ benchmark-driver benchmark/scan.yaml
Warming up --------------------------------------
          check(reg)    10.558M i/s -     10.848M times in 1.027445s (94.71ns/i)
          check(str)    13.368M i/s -     13.782M times in 1.030978s (74.80ns/i)
         match?(reg)    16.080M i/s -     16.247M times in 1.010340s (62.19ns/i)
         match?(str)    23.336M i/s -     23.501M times in 1.007088s (42.85ns/i)
Calculating -------------------------------------
          check(reg)    11.601M i/s -     31.675M times in 2.730287s (86.20ns/i)
          check(str)    15.217M i/s -     40.104M times in 2.635475s (65.72ns/i)
         match?(reg)    18.781M i/s -     48.241M times in 2.568662s (53.25ns/i)
         match?(str)    29.441M i/s -     70.007M times in 2.377840s (33.97ns/i)

Comparison:
         match?(str):  29441324.5 i/s
         match?(reg):  18780543.7 i/s - 1.57x  slower
          check(str):  15217130.1 i/s - 1.93x  slower
          check(reg):  11601371.2 i/s - 2.54x  slower
```
### JRuby

```
$ benchmark-driver benchmark/scan.yaml
Warming up --------------------------------------
          check(reg)     8.129M i/s -      8.090M times in 0.995222s (123.02ns/i)
          check(str)    16.691M i/s -     16.616M times in 0.995519s (59.91ns/i)
         match?(reg)     8.979M i/s -      9.001M times in 1.002440s (111.37ns/i)
         match?(str)    26.138M i/s -     26.011M times in 0.995150s (38.26ns/i)
Calculating -------------------------------------
          check(reg)    11.808M i/s -     24.387M times in 2.065238s (84.69ns/i)
          check(str)    31.762M i/s -     50.072M times in 1.576495s (31.48ns/i)
         match?(reg)    13.944M i/s -     26.936M times in 1.931719s (71.71ns/i)
         match?(str)    50.872M i/s -     78.414M times in 1.541392s (19.66ns/i)

Comparison:
         match?(str):  50872250.2 i/s
          check(str):  31761544.3 i/s - 1.60x  slower
         match?(reg):  13944219.6 i/s - 3.65x  slower
          check(reg):  11808244.1 i/s - 4.31x  slower
```

## search
- search_full(regexp, false, true) == StringScanner#check_until
- search_full(regexp, false, false) == StringScanner#exist?
```
$ benchmark-driver benchmark/search.yaml
Warming up --------------------------------------
    check_until(reg)     9.338M i/s -      9.456M times in 1.012573s (107.09ns/i)
    check_until(str)    11.385M i/s -     11.979M times in 1.052173s (87.83ns/i)
         exist?(reg)    13.416M i/s -     13.517M times in 1.007532s (74.54ns/i)
         exist?(str)    17.976M i/s -     18.677M times in 1.038981s (55.63ns/i)
Calculating -------------------------------------
    check_until(reg)    10.297M i/s -     28.015M times in 2.720634s (97.11ns/i)
    check_until(str)    12.684M i/s -     34.156M times in 2.692853s (78.84ns/i)
         exist?(reg)    15.184M i/s -     40.249M times in 2.650786s (65.86ns/i)
         exist?(str)    21.426M i/s -     53.928M times in 2.517008s (46.67ns/i)

Comparison:
         exist?(str):  21425527.1 i/s
         exist?(reg):  15183679.9 i/s - 1.41x  slower
    check_until(str):  12684053.7 i/s - 1.69x  slower
    check_until(reg):  10297134.8 i/s - 2.08x  slower
```

### JRuby
```
$ benchmark-driver benchmark/search.yaml
Warming up --------------------------------------
    check_until(reg)     7.646M i/s -      7.649M times in 1.000381s (130.78ns/i)
    check_until(str)    13.075M i/s -     13.010M times in 0.995048s (76.48ns/i)
         exist?(reg)     8.728M i/s -      8.684M times in 0.994921s (114.57ns/i)
         exist?(str)    20.609M i/s -     20.514M times in 0.995399s (48.52ns/i)
Calculating -------------------------------------
    check_until(reg)     9.371M i/s -     22.939M times in 2.447900s (106.71ns/i)
    check_until(str)    22.760M i/s -     39.225M times in 1.723414s (43.94ns/i)
         exist?(reg)    11.758M i/s -     26.185M times in 2.226997s (85.05ns/i)
         exist?(str)    34.564M i/s -     61.827M times in 1.788749s (28.93ns/i)

Comparison:
         exist?(str):  34564306.2 i/s
    check_until(str):  22759878.4 i/s - 1.52x  slower
         exist?(reg):  11757927.4 i/s - 2.94x  slower
    check_until(reg):   9371009.3 i/s - 3.69x  slower
```

81a80a176b
2024-11-27 09:24:06 +09:00
Jean Boussier
e440268d51 Get rid of JSON benchmarks 2024-11-05 12:19:55 +01:00
Hiroshi SHIBATA
ff56064469
Fixup b1fc1af444. Removed benchmark files from ruby/json 2024-11-05 11:02:13 +09:00
Jean Boussier
cc2e67a138 Elide Generator::State allocation until a to_json method has to be called
Fix: https://github.com/ruby/json/issues/655

For very small documents, the biggest performance gap with alternatives is
that the API impose that we allocate the `State` object. In a real world app
this doesn't make much of a difference, but when running in a micro-benchmark
this doubles the allocations, causing twice the amount of GC runs, making us
look bad.

However, unless we have to call a `to_json` method, the `State` object isn't
visible, so with some refactoring, we can elude that allocation entirely.

Instead we allocate the State internal struct on the stack, and if we need
to call a `to_json` method, we allocate the `State` and spill the struct on
the heap.

As a result, `JSON.generate` is now as fast as re-using a `State` instance,
as long as only primitives are generated.

Before:
```
== Encoding small mixed (34 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   598.654k i/100ms
                json   400.542k i/100ms
                  oj   533.353k i/100ms
Calculating -------------------------------------
        json (reuse)      6.371M (± 8.6%) i/s  (156.96 ns/i) -     31.729M in   5.059195s
                json      4.120M (± 6.6%) i/s  (242.72 ns/i) -     20.828M in   5.090549s
                  oj      5.622M (± 6.4%) i/s  (177.86 ns/i) -     28.268M in   5.061473s

Comparison:
        json (reuse):  6371126.6 i/s
                  oj:  5622452.0 i/s - same-ish: difference falls within error
                json:  4119991.1 i/s - 1.55x  slower

== Encoding small nested array (121 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   248.125k i/100ms
                json   215.255k i/100ms
                  oj   217.531k i/100ms
Calculating -------------------------------------
        json (reuse)      2.628M (± 6.1%) i/s  (380.55 ns/i) -     13.151M in   5.030281s
                json      2.185M (± 6.7%) i/s  (457.74 ns/i) -     10.978M in   5.057655s
                  oj      2.217M (± 6.7%) i/s  (451.10 ns/i) -     11.094M in   5.044844s

Comparison:
        json (reuse):  2627799.4 i/s
                  oj:  2216824.8 i/s - 1.19x  slower
                json:  2184669.5 i/s - 1.20x  slower

== Encoding small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   641.334k i/100ms
                json   322.745k i/100ms
                  oj   642.450k i/100ms
Calculating -------------------------------------
        json (reuse)      7.133M (± 6.5%) i/s  (140.19 ns/i) -     35.915M in   5.068201s
                json      4.615M (± 7.0%) i/s  (216.70 ns/i) -     22.915M in   5.003718s
                  oj      6.912M (± 6.4%) i/s  (144.68 ns/i) -     34.692M in   5.047690s

Comparison:
        json (reuse):  7133123.3 i/s
                  oj:  6911977.1 i/s - same-ish: difference falls within error
                json:  4614696.6 i/s - 1.55x  slower
```

After:

```
== Encoding small mixed (34 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   572.751k i/100ms
                json   457.741k i/100ms
                  oj   512.247k i/100ms
Calculating -------------------------------------
        json (reuse)      6.324M (± 6.9%) i/s  (158.12 ns/i) -     31.501M in   5.023093s
                json      6.263M (± 6.9%) i/s  (159.66 ns/i) -     31.126M in   5.017086s
                  oj      5.569M (± 6.6%) i/s  (179.56 ns/i) -     27.661M in   5.003739s

Comparison:
        json (reuse):  6324183.5 i/s
                json:  6263204.9 i/s - same-ish: difference falls within error
                  oj:  5569049.2 i/s - same-ish: difference falls within error

== Encoding small nested array (121 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   258.505k i/100ms
                json   242.335k i/100ms
                  oj   220.678k i/100ms
Calculating -------------------------------------
        json (reuse)      2.589M (± 9.6%) i/s  (386.17 ns/i) -     12.925M in   5.071853s
                json      2.594M (± 6.6%) i/s  (385.46 ns/i) -     13.086M in   5.083035s
                  oj      2.250M (± 2.3%) i/s  (444.43 ns/i) -     11.255M in   5.004707s

Comparison:
        json (reuse):  2589499.6 i/s
                json:  2594321.0 i/s - same-ish: difference falls within error
                  oj:  2250064.0 i/s - 1.15x  slower

== Encoding small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   656.373k i/100ms
                json   644.135k i/100ms
                  oj   650.283k i/100ms
Calculating -------------------------------------
        json (reuse)      7.202M (± 7.1%) i/s  (138.84 ns/i) -     36.101M in   5.051438s
                json      7.278M (± 1.7%) i/s  (137.40 ns/i) -     36.716M in   5.046300s
                  oj      7.036M (± 1.7%) i/s  (142.12 ns/i) -     35.766M in   5.084729s

Comparison:
        json (reuse):  7202447.9 i/s
                json:  7277883.0 i/s - same-ish: difference falls within error
                  oj:  7036115.2 i/s - same-ish: difference falls within error

```
2024-11-01 13:04:24 +09:00
Jean Boussier
b042d9d9c1 [ruby/json] Use JSON.generate instead of JSON.dump for benchmarking
97b61edce1
2024-11-01 13:04:24 +09:00
Jean Boussier
2e43621806
[ruby/json] Optimize fbuffer_append_long
Ref: https://github.com/ruby/json/issues/655

Rather than to write the number backward, and then reverse
the buffer, we can start from the back of the buffer and write
the number in the proper direction.

Before:

```
== Encoding integers (8009 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json     8.606k i/100ms
                  oj     9.598k i/100ms
Calculating -------------------------------------
                json     86.059k (± 0.8%) i/s   (11.62 μs/i) -    430.300k in   5.000416s
                  oj     97.409k (± 0.6%) i/s   (10.27 μs/i) -    489.498k in   5.025360s

Comparison:
                json:    86058.8 i/s
                  oj:    97408.8 i/s - 1.13x  faster
```

After:

```
== Encoding integers (8009 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)     9.500k i/100ms
                json     9.359k i/100ms
                  oj     9.722k i/100ms
Calculating -------------------------------------
        json (reuse)     96.270k (± 0.4%) i/s   (10.39 μs/i) -    484.500k in   5.032777s
                json     94.800k (± 2.2%) i/s   (10.55 μs/i) -    477.309k in   5.037495s
                  oj     97.131k (± 0.7%) i/s   (10.30 μs/i) -    486.100k in   5.004822s

Comparison:
        json (reuse):    96270.1 i/s
                  oj:    97130.5 i/s - same-ish: difference falls within error
                json:    94799.9 i/s - same-ish: difference falls within error
```

0655b58d14
2024-10-29 13:25:01 +09:00
Jean Boussier
00aa1f9a1d [ruby/json] Encoding benchmark updates
Remove `rapidjson` as it's 2x slower most benchmarks, and on
par on a couple of them, so it's not telling us much here.

Configure `Oj` in compat mode so it generate the same JSON
on the `many to_json` benchmark.

```
== Encoding small nested array (121 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   220.202k i/100ms
                json   162.190k i/100ms
                  oj   222.094k i/100ms
Calculating -------------------------------------
        json (reuse)      2.322M (± 1.3%) i/s  (430.72 ns/i) -     11.671M in   5.027655s
                json      1.707M (± 1.2%) i/s  (585.76 ns/i) -      8.596M in   5.035996s
                  oj      2.248M (± 1.4%) i/s  (444.94 ns/i) -     11.327M in   5.040712s

Comparison:
        json (reuse):  2321686.9 i/s
                  oj:  2247509.6 i/s - 1.03x  slower
                json:  1707179.3 i/s - 1.36x  slower

== Encoding small hash (65 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   446.184k i/100ms
                json   265.594k i/100ms
                  oj   653.226k i/100ms
Calculating -------------------------------------
        json (reuse)      4.980M (± 1.4%) i/s  (200.82 ns/i) -     24.986M in   5.018729s
                json      2.763M (± 1.8%) i/s  (361.94 ns/i) -     13.811M in   5.000434s
                  oj      7.232M (± 1.4%) i/s  (138.28 ns/i) -     36.581M in   5.059377s

Comparison:
        json (reuse):  4979642.4 i/s
                  oj:  7231624.4 i/s - 1.45x  faster
                json:  2762890.1 i/s - 1.80x  slower

== Encoding mixed utf8 (5003001 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json    34.000 i/100ms
                  oj    36.000 i/100ms
Calculating -------------------------------------
                json    357.772 (± 4.8%) i/s    (2.80 ms/i) -      1.802k in   5.047308s
                  oj    327.521 (± 1.5%) i/s    (3.05 ms/i) -      1.656k in   5.057241s

Comparison:
                json:      357.8 i/s
                  oj:      327.5 i/s - 1.09x  slower

== Encoding mostly utf8 (5001001 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json    26.000 i/100ms
                  oj    36.000 i/100ms
Calculating -------------------------------------
                json    294.357 (±10.5%) i/s    (3.40 ms/i) -      1.456k in   5.028862s
                  oj    352.826 (± 8.2%) i/s    (2.83 ms/i) -      1.764k in   5.045651s

Comparison:
                json:      294.4 i/s
                  oj:      352.8 i/s - same-ish: difference falls within error

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   206.000 i/100ms
                  oj   229.000 i/100ms
Calculating -------------------------------------
                json      2.064k (± 9.3%) i/s  (484.55 μs/i) -     10.300k in   5.056409s
                  oj      2.121k (± 8.4%) i/s  (471.47 μs/i) -     10.534k in   5.012315s

Comparison:
                json:     2063.8 i/s
                  oj:     2121.0 i/s - same-ish: difference falls within error

== Encoding citm_catalog.json (500298 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   119.000 i/100ms
                  oj   126.000 i/100ms
Calculating -------------------------------------
                json      1.317k (± 2.3%) i/s  (759.18 μs/i) -      6.664k in   5.061781s
                  oj      1.261k (± 2.9%) i/s  (793.11 μs/i) -      6.300k in   5.000714s

Comparison:
                json:     1317.2 i/s
                  oj:     1260.9 i/s - same-ish: difference falls within error

== Encoding canada.json (2090234 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json     1.000 i/100ms
                  oj     1.000 i/100ms
Calculating -------------------------------------
                json     19.590 (± 0.0%) i/s   (51.05 ms/i) -     98.000 in   5.004485s
                  oj     19.003 (± 0.0%) i/s   (52.62 ms/i) -     95.000 in   5.002276s

Comparison:
                json:       19.6 i/s
                  oj:       19.0 i/s - 1.03x  slower

== Encoding many #to_json calls (2701 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json     2.556k i/100ms
                  oj     2.332k i/100ms
Calculating -------------------------------------
                json     25.367k (± 1.7%) i/s   (39.42 μs/i) -    127.800k in   5.039438s
                  oj     23.743k (± 1.5%) i/s   (42.12 μs/i) -    118.932k in   5.010303s

Comparison:
                json:    25367.3 i/s
                  oj:    23743.3 i/s - 1.07x  slower

```

5a64fd5b6f
2024-10-26 18:44:15 +09:00
Jean Boussier
e52b47680e [ruby/json] Reduce encoding benchmark size
Profiling revealed that we were spending lots of time growing the buffer.
Buffer operations is definitely something we want to optimize, but for
this specific benchmark what we're interested in is UTF-8 scanning performance.

Each iteration of the two scaning benchmark were producing 20MB of JSON,
now they only produce 5MB.

Now:

```
== Encoding mostly utf8 (5001001 bytes)
ruby 3.4.0dev (2024-10-18T19:01:45Z master 7be9a333ca) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json    35.000 i/100ms
                  oj    36.000 i/100ms
           rapidjson    10.000 i/100ms
Calculating -------------------------------------
                json    359.161 (± 1.4%) i/s    (2.78 ms/i) -      1.820k in   5.068542s
                  oj    359.699 (± 0.6%) i/s    (2.78 ms/i) -      1.800k in   5.004291s
           rapidjson     99.687 (± 2.0%) i/s   (10.03 ms/i) -    500.000 in   5.017321s

Comparison:
                json:      359.2 i/s
                  oj:      359.7 i/s - same-ish: difference falls within error
           rapidjson:       99.7 i/s - 3.60x  slower
```

1a338532d2
2024-10-26 18:44:15 +09:00
Jean Boussier
97713ac952 [ruby/json] convert_UTF8_to_JSON: repurpose the escape tables into size tables
Since we're looking up the table anyway, we might as well store the
UTF-8 char length in it. For single byte characters that don't need
escaping we store `0`.

This helps on strings with lots of multi-byte characters:

Before:

```
== Encoding mostly utf8 (20004001 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json     6.000 i/100ms
                  oj    10.000 i/100ms
           rapidjson     2.000 i/100ms
Calculating -------------------------------------
                json     67.978 (± 1.5%) i/s   (14.71 ms/i) -    342.000 in   5.033062s
                  oj    100.876 (± 2.0%) i/s    (9.91 ms/i) -    510.000 in   5.058080s
           rapidjson     26.389 (± 7.6%) i/s   (37.89 ms/i) -    132.000 in   5.027681s

Comparison:
                json:       68.0 i/s
                  oj:      100.9 i/s - 1.48x  faster
           rapidjson:       26.4 i/s - 2.58x  slower
```

After:

```
== Encoding mostly utf8 (20004001 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json     7.000 i/100ms
                  oj    10.000 i/100ms
           rapidjson     2.000 i/100ms
Calculating -------------------------------------
                json     75.187 (± 2.7%) i/s   (13.30 ms/i) -    378.000 in   5.030111s
                  oj     95.196 (± 2.1%) i/s   (10.50 ms/i) -    480.000 in   5.043565s
           rapidjson     25.969 (± 3.9%) i/s   (38.51 ms/i) -    130.000 in   5.011471s

Comparison:
                json:       75.2 i/s
                  oj:       95.2 i/s - 1.27x  faster
           rapidjson:       26.0 i/s - 2.90x  slower
```

51e2631d1f
2024-10-26 18:44:15 +09:00
Jean Boussier
9f300d0541 [ruby/json] Optimize convert_UTF8_to_JSON for mostly ASCII strings
If we assume that even UTF-8 strings are mostly ASCII, we can implement a
fast path for the ASCII parts.

Before:

```
== Encoding mixed utf8 (20012001 bytes)
ruby 3.4.0dev (2024-10-18T15:12:54Z master d1b5c10957) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json     5.000 i/100ms
                  oj     9.000 i/100ms
           rapidjson     2.000 i/100ms
Calculating -------------------------------------
                json     49.403 (± 2.0%) i/s   (20.24 ms/i) -    250.000 in   5.062647s
                  oj    100.120 (± 2.0%) i/s    (9.99 ms/i) -    504.000 in   5.035349s
           rapidjson     26.404 (± 0.0%) i/s   (37.87 ms/i) -    132.000 in   5.001025s

Comparison:
                json:       49.4 i/s
                  oj:      100.1 i/s - 2.03x  faster
           rapidjson:       26.4 i/s - 1.87x  slower
```

After:

```
== Encoding mixed utf8 (20012001 bytes)
ruby 3.4.0dev (2024-10-18T15:12:54Z master d1b5c10957) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json    10.000 i/100ms
                  oj     9.000 i/100ms
           rapidjson     2.000 i/100ms
Calculating -------------------------------------
                json     95.686 (± 2.1%) i/s   (10.45 ms/i) -    480.000 in   5.018575s
                  oj     96.875 (± 2.1%) i/s   (10.32 ms/i) -    486.000 in   5.019097s
           rapidjson     26.260 (± 3.8%) i/s   (38.08 ms/i) -    132.000 in   5.033151s

Comparison:
                json:       95.7 i/s
                  oj:       96.9 i/s - same-ish: difference falls within error
           rapidjson:       26.3 i/s - 3.64x  slower
```

f8166c2d7f
2024-10-26 18:44:15 +09:00
Jean Boussier
aed0114913 [ruby/json] Annotate the encoding benchmark
Note where we currently stand, what the current bottlencks are
and what could or can't be done.

```
== Encoding small nested array (121 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin23]
Warming up --------------------------------------
                json   129.145k i/100ms
        json (reuse)   239.395k i/100ms
                  oj   211.514k i/100ms
           rapidjson   130.660k i/100ms
Calculating -------------------------------------
                json      1.284M (± 0.3%) i/s  (779.11 ns/i) -      6.457M in   5.030954s
        json (reuse)      2.405M (± 0.1%) i/s  (415.77 ns/i) -     12.209M in   5.076202s
                  oj      2.118M (± 0.0%) i/s  (472.11 ns/i) -     10.787M in   5.092795s
           rapidjson      1.325M (± 1.3%) i/s  (754.82 ns/i) -      6.664M in   5.030763s

Comparison:
                json:  1283514.8 i/s
        json (reuse):  2405175.0 i/s - 1.87x  faster
                  oj:  2118132.9 i/s - 1.65x  faster
           rapidjson:  1324820.8 i/s - 1.03x  faster

== Encoding small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin23]
Warming up --------------------------------------
                json   177.502k i/100ms
        json (reuse)   485.963k i/100ms
                  oj   656.566k i/100ms
           rapidjson   227.985k i/100ms
Calculating -------------------------------------
                json      1.774M (± 3.1%) i/s  (563.67 ns/i) -      8.875M in   5.007964s
        json (reuse)      4.804M (± 3.0%) i/s  (208.16 ns/i) -     24.298M in   5.062426s
                  oj      6.564M (± 1.9%) i/s  (152.36 ns/i) -     32.828M in   5.003539s
           rapidjson      2.229M (± 2.0%) i/s  (448.59 ns/i) -     11.171M in   5.013299s

Comparison:
                json:  1774084.6 i/s
                  oj:  6563547.8 i/s - 3.70x  faster
        json (reuse):  4804083.0 i/s - 2.71x  faster
           rapidjson:  2229209.5 i/s - 1.26x  faster

== Encoding twitter.json (466906 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin23]
Warming up --------------------------------------
                json   212.000 i/100ms
                  oj   222.000 i/100ms
           rapidjson   109.000 i/100ms
Calculating -------------------------------------
                json      2.135k (± 0.7%) i/s  (468.32 μs/i) -     10.812k in   5.063665s
                  oj      2.219k (± 1.9%) i/s  (450.69 μs/i) -     11.100k in   5.004642s
           rapidjson      1.093k (± 3.8%) i/s  (914.66 μs/i) -      5.559k in   5.090812s

Comparison:
                json:     2135.3 i/s
                  oj:     2218.8 i/s - 1.04x  faster
           rapidjson:     1093.3 i/s - 1.95x  slower

== Encoding citm_catalog.json (500298 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin23]
Warming up --------------------------------------
                json   132.000 i/100ms
                  oj   126.000 i/100ms
           rapidjson    96.000 i/100ms
Calculating -------------------------------------
                json      1.304k (± 2.2%) i/s  (766.96 μs/i) -      6.600k in   5.064483s
                  oj      1.272k (± 0.8%) i/s  (786.14 μs/i) -      6.426k in   5.052044s
           rapidjson    997.370 (± 4.8%) i/s    (1.00 ms/i) -      4.992k in   5.016266s

Comparison:
                json:     1303.9 i/s
                  oj:     1272.0 i/s - same-ish: difference falls within error
           rapidjson:      997.4 i/s - 1.31x  slower

== Encoding canada.json (2090234 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin23]
Warming up --------------------------------------
                json     2.000 i/100ms
                  oj     3.000 i/100ms
           rapidjson     1.000 i/100ms
Calculating -------------------------------------
                json     20.001 (± 0.0%) i/s   (50.00 ms/i) -    102.000 in   5.100950s
                  oj     30.823 (± 0.0%) i/s   (32.44 ms/i) -    156.000 in   5.061333s
           rapidjson     19.446 (± 0.0%) i/s   (51.42 ms/i) -     98.000 in   5.041884s

Comparison:
                json:       20.0 i/s
                  oj:       30.8 i/s - 1.54x  faster
           rapidjson:       19.4 i/s - 1.03x  slower

== Encoding many #to_json calls (2661 bytes)
oj does not match expected output. Skipping
rapidjson unsupported (Invalid object key type: Object)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin23]
Warming up --------------------------------------
                json     2.200k i/100ms
Calculating -------------------------------------
                json     22.253k (± 0.2%) i/s   (44.94 μs/i) -    112.200k in   5.041962s
```

77e97b3d4e
2024-10-26 18:44:15 +09:00
ydah
199691553e [ruby/json] Godounov ==> Godunov
dbf7e9f473
2024-10-16 07:11:03 +00:00
Jean Boussier
615a087216 [ruby/json] Restore the simple standlone benchmark for iterating
7b68800991
2024-10-09 12:35:54 +00:00
Jean Boussier
6e2619c968 Revamp the benchmark suite
There is a large number of outstanding performance PRs that I want to
merge, but we need a decent benchmark to judge if they are effective.

I went to borrow rapidjson's benchmark suite, which is a good start.

I only kept the comparison with Oj and RapidJSON, because YAJL is
slower on most benchmarks, so little point comparing to it.

Encoding:

```
== Encoding small nested array (121 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    88.225k i/100ms
                  oj   209.862k i/100ms
           rapidjson   128.978k i/100ms
Calculating -------------------------------------
                json    914.611k (± 0.4%) i/s    (1.09 μs/i) -      4.588M in   5.016099s
                  oj      2.163M (± 0.2%) i/s  (462.39 ns/i) -     10.913M in   5.045964s
           rapidjson      1.392M (± 1.3%) i/s  (718.55 ns/i) -      6.965M in   5.005438s

Comparison:
                json:   914610.6 i/s
                  oj:  2162693.5 i/s - 2.36x  faster
           rapidjson:  1391682.6 i/s - 1.52x  faster

== Encoding small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   142.093k i/100ms
                  oj   651.412k i/100ms
           rapidjson   237.706k i/100ms
Calculating -------------------------------------
                json      1.478M (± 0.7%) i/s  (676.78 ns/i) -      7.389M in   5.000866s
                  oj      7.150M (± 0.7%) i/s  (139.85 ns/i) -     35.828M in   5.010756s
           rapidjson      2.250M (± 1.6%) i/s  (444.46 ns/i) -     11.410M in   5.072451s

Comparison:
                json:  1477595.1 i/s
                  oj:  7150472.0 i/s - 4.84x  faster
           rapidjson:  2249926.7 i/s - 1.52x  faster

== Encoding twitter.json (466906 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   101.000 i/100ms
                  oj   223.000 i/100ms
           rapidjson   105.000 i/100ms
Calculating -------------------------------------
                json      1.017k (± 0.7%) i/s  (982.83 μs/i) -      5.151k in   5.062786s
                  oj      2.244k (± 0.7%) i/s  (445.72 μs/i) -     11.373k in   5.069428s
           rapidjson      1.069k (± 4.6%) i/s  (935.20 μs/i) -      5.355k in   5.016652s

Comparison:
                json:     1017.5 i/s
                  oj:     2243.6 i/s - 2.21x  faster
           rapidjson:     1069.3 i/s - same-ish: difference falls within error

== Encoding citm_catalog.json (500299 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    77.000 i/100ms
                  oj   129.000 i/100ms
           rapidjson    96.000 i/100ms
Calculating -------------------------------------
                json    767.217 (± 2.5%) i/s    (1.30 ms/i) -      3.850k in   5.021957s
                  oj      1.291k (± 1.5%) i/s  (774.45 μs/i) -      6.579k in   5.096439s
           rapidjson    959.527 (± 1.1%) i/s    (1.04 ms/i) -      4.800k in   5.003052s

Comparison:
                json:      767.2 i/s
                  oj:     1291.2 i/s - 1.68x  faster
           rapidjson:      959.5 i/s - 1.25x  faster

== Encoding canada.json (2090234 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json     1.000 i/100ms
                  oj     3.000 i/100ms
           rapidjson     1.000 i/100ms
Calculating -------------------------------------
                json     19.748 (± 0.0%) i/s   (50.64 ms/i) -     99.000 in   5.013336s
                  oj     31.016 (± 0.0%) i/s   (32.24 ms/i) -    156.000 in   5.029732s
           rapidjson     19.419 (± 0.0%) i/s   (51.50 ms/i) -     98.000 in   5.050382s

Comparison:
                json:       19.7 i/s
                  oj:       31.0 i/s - 1.57x  faster
           rapidjson:       19.4 i/s - 1.02x  slower

== Encoding many #to_json calls (2661 bytes)
oj does not match expected output. Skipping
rapidjson unsupported (Invalid object key type: Object)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json     2.129k i/100ms
Calculating -------------------------------------
                json     21.599k (± 0.6%) i/s   (46.30 μs/i) -    108.579k in   5.027198s
```

Parsing:

```
== Parsing small nested array (121 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    47.497k i/100ms
                  oj    54.115k i/100ms
           oj strict    53.854k i/100ms
          Oj::Parser   150.904k i/100ms
           rapidjson    80.775k i/100ms
Calculating -------------------------------------
                json    481.096k (± 1.1%) i/s    (2.08 μs/i) -      2.422M in   5.035657s
                  oj    554.878k (± 0.6%) i/s    (1.80 μs/i) -      2.814M in   5.071521s
           oj strict    547.888k (± 0.7%) i/s    (1.83 μs/i) -      2.747M in   5.013212s
          Oj::Parser      1.545M (± 0.4%) i/s  (647.16 ns/i) -      7.847M in   5.078302s
           rapidjson    822.422k (± 0.6%) i/s    (1.22 μs/i) -      4.120M in   5.009178s

Comparison:
                json:   481096.4 i/s
          Oj::Parser:  1545223.5 i/s - 3.21x  faster
           rapidjson:   822422.4 i/s - 1.71x  faster
                  oj:   554877.7 i/s - 1.15x  faster
           oj strict:   547887.7 i/s - 1.14x  faster

== Parsing small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   154.479k i/100ms
                  oj   220.283k i/100ms
           oj strict   249.928k i/100ms
          Oj::Parser   445.062k i/100ms
           rapidjson   289.615k i/100ms
Calculating -------------------------------------
                json      1.581M (± 3.0%) i/s  (632.55 ns/i) -      8.033M in   5.086476s
                  oj      2.202M (± 3.5%) i/s  (454.08 ns/i) -     11.014M in   5.008146s
           oj strict      2.498M (± 3.5%) i/s  (400.25 ns/i) -     12.496M in   5.008245s
          Oj::Parser      4.640M (± 0.4%) i/s  (215.50 ns/i) -     23.588M in   5.083443s
           rapidjson      3.111M (± 0.3%) i/s  (321.44 ns/i) -     15.639M in   5.027097s

Comparison:
                json:  1580898.5 i/s
          Oj::Parser:  4640298.1 i/s - 2.94x  faster
           rapidjson:  3111005.2 i/s - 1.97x  faster
           oj strict:  2498421.4 i/s - 1.58x  faster
                  oj:  2202276.6 i/s - 1.39x  faster

== Parsing test from oj (256 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    37.580k i/100ms
                  oj    41.899k i/100ms
           oj strict    50.731k i/100ms
          Oj::Parser    74.589k i/100ms
           rapidjson    50.954k i/100ms
Calculating -------------------------------------
                json    382.150k (± 1.0%) i/s    (2.62 μs/i) -      1.917M in   5.015737s
                  oj    420.282k (± 0.2%) i/s    (2.38 μs/i) -      2.137M in   5.084338s
           oj strict    511.758k (± 0.5%) i/s    (1.95 μs/i) -      2.587M in   5.055821s
          Oj::Parser    759.087k (± 0.3%) i/s    (1.32 μs/i) -      3.804M in   5.011388s
           rapidjson    518.273k (± 1.8%) i/s    (1.93 μs/i) -      2.599M in   5.015867s

Comparison:
                json:   382149.6 i/s
          Oj::Parser:   759087.1 i/s - 1.99x  faster
           rapidjson:   518272.8 i/s - 1.36x  faster
           oj strict:   511758.4 i/s - 1.34x  faster
                  oj:   420282.5 i/s - 1.10x  faster

== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    52.000 i/100ms
                  oj    63.000 i/100ms
           oj strict    74.000 i/100ms
          Oj::Parser    79.000 i/100ms
           rapidjson    56.000 i/100ms
Calculating -------------------------------------
                json    522.896 (± 0.4%) i/s    (1.91 ms/i) -      2.652k in   5.071809s
                  oj    624.849 (± 0.6%) i/s    (1.60 ms/i) -      3.150k in   5.041398s
           oj strict    737.779 (± 0.4%) i/s    (1.36 ms/i) -      3.700k in   5.015117s
          Oj::Parser    789.254 (± 0.3%) i/s    (1.27 ms/i) -      3.950k in   5.004764s
           rapidjson    565.663 (± 0.4%) i/s    (1.77 ms/i) -      2.856k in   5.049015s

Comparison:
                json:      522.9 i/s
          Oj::Parser:      789.3 i/s - 1.51x  faster
           oj strict:      737.8 i/s - 1.41x  faster
                  oj:      624.8 i/s - 1.19x  faster
           rapidjson:      565.7 i/s - 1.08x  faster

== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    27.000 i/100ms
                  oj    31.000 i/100ms
           oj strict    36.000 i/100ms
          Oj::Parser    42.000 i/100ms
           rapidjson    38.000 i/100ms
Calculating -------------------------------------
                json    305.248 (± 0.3%) i/s    (3.28 ms/i) -      1.539k in   5.041813s
                  oj    320.265 (± 3.4%) i/s    (3.12 ms/i) -      1.612k in   5.039715s
           oj strict    373.701 (± 1.6%) i/s    (2.68 ms/i) -      1.872k in   5.010633s
          Oj::Parser    457.792 (± 0.4%) i/s    (2.18 ms/i) -      2.310k in   5.046049s
           rapidjson    350.933 (± 8.8%) i/s    (2.85 ms/i) -      1.748k in   5.052491s

Comparison:
                json:      305.2 i/s
          Oj::Parser:      457.8 i/s - 1.50x  faster
           oj strict:      373.7 i/s - 1.22x  faster
           rapidjson:      350.9 i/s - 1.15x  faster
                  oj:      320.3 i/s - 1.05x  faster

== Parsing canada.json (2251051 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json     2.000 i/100ms
                  oj     2.000 i/100ms
           oj strict     2.000 i/100ms
          Oj::Parser     2.000 i/100ms
           rapidjson    28.000 i/100ms
Calculating -------------------------------------
                json     29.216 (± 6.8%) i/s   (34.23 ms/i) -    146.000 in   5.053753s
                  oj     24.899 (± 0.0%) i/s   (40.16 ms/i) -    126.000 in   5.061915s
           oj strict     24.828 (± 4.0%) i/s   (40.28 ms/i) -    124.000 in   5.003067s
          Oj::Parser     30.867 (± 3.2%) i/s   (32.40 ms/i) -    156.000 in   5.057104s
           rapidjson    285.761 (± 1.0%) i/s    (3.50 ms/i) -      1.456k in   5.095715s

Comparison:
                json:       29.2 i/s
           rapidjson:      285.8 i/s - 9.78x  faster
          Oj::Parser:       30.9 i/s - same-ish: difference falls within error
                  oj:       24.9 i/s - 1.17x  slower
           oj strict:       24.8 i/s - 1.18x  slower
```
2024-10-08 14:18:37 +00:00
Matt Valentine-House
8e7df4b7c6 Rename size_pool -> heap
Now that we've inlined the eden_heap into the size_pool, we should
rename the size_pool to heap. So that Ruby contains multiple heaps, with
different sized objects.

The term heap as a collection of memory pages is more in memory
management nomenclature, whereas size_pool was a name chosen out of
necessity during the development of the Variable Width Allocation
features of Ruby.

The concept of size pools was introduced in order to facilitate
different sized objects (other than the default 40 bytes). They wrapped
the eden heap and the tomb heap, and some related state, and provided a
reasonably simple way of duplicating all related concerns, to provide
multiple pools that all shared the same structure but held different
objects.

Since then various changes have happend in Ruby's memory layout:

* The concept of tomb heaps has been replaced by a global free pages list,
  with each page having it's slot size reconfigured at the point when it
  is resurrected
* the eden heap has been inlined into the size pool itself, so that now
  the size pool directly controls the free_pages list, the sweeping
  page, the compaction cursor and the other state that was previously
  being managed by the eden heap.

Now that there is no need for a heap wrapper, we should refer to the
collection of pages containing Ruby objects as a heap again rather than
a size pool
2024-10-03 21:20:09 +01:00
Nobuyoshi Nakada
7be1fafe58
Refactor Time#xmlschema
And refine uncommon date cases.

# Iteration per second (i/s)

|                            |compare-ruby|built-ruby|
|:---------------------------|-----------:|---------:|
|time.xmlschema              |      5.020M|   14.192M|
|                            |           -|     2.83x|
|utc_time.xmlschema          |      6.454M|   15.331M|
|                            |           -|     2.38x|
|time.xmlschema(6)           |      4.216M|   10.043M|
|                            |           -|     2.38x|
|utc_time.xmlschema(6)       |      5.486M|   10.592M|
|                            |           -|     1.93x|
|time.xmlschema(9)           |      4.294M|   10.340M|
|                            |           -|     2.41x|
|utc_time.xmlschema(9)       |      4.784M|   10.909M|
|                            |           -|     2.28x|
|fraction_sec.xmlschema(10)  |    366.982k|    3.406M|
|                            |           -|     9.28x|
|future_time.xmlschema       |    994.595k|   15.853M|
|                            |           -|    15.94x|
2024-09-23 14:29:25 +09:00
aoki1980taichi
5894202365 typo otherBasis -> orthoBasis
The original function name in ao.c was orthoBasis.
I guess the function is generating orthonormal basis (https://en.wikipedia.org/wiki/Orthonormal_basis).
2024-09-13 14:35:25 +09:00
Jean Boussier
57e3fc32ea Move Time#xmlschema in core and optimize it
[Feature #20707]

Converting Time into RFC3339 / ISO8601 representation is an significant
hotspot for applications that serialize data in JSON, XML or other formats.

By moving it into core we can optimize it much further than what `strftime` will
allow.

```
compare-ruby: ruby 3.4.0dev (2024-08-29T13:11:40Z master 6b08a50a62) +YJIT [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-08-30T13:17:32Z native-xmlschema 34041ff71f) +YJIT [arm64-darwin23]
warming up......

|                        |compare-ruby|built-ruby|
|:-----------------------|-----------:|---------:|
|time.xmlschema          |      1.087M|    5.190M|
|                        |           -|     4.78x|
|utc_time.xmlschema      |      1.464M|    6.848M|
|                        |           -|     4.68x|
|time.xmlschema(6)       |    859.960k|    4.646M|
|                        |           -|     5.40x|
|utc_time.xmlschema(6)   |      1.080M|    5.917M|
|                        |           -|     5.48x|
|time.xmlschema(9)       |    893.909k|    4.668M|
|                        |           -|     5.22x|
|utc_time.xmlschema(9)   |      1.056M|    5.707M|
|                        |           -|     5.40x|
```
2024-09-05 19:23:12 +02:00
Jean Boussier
a3f589640f Time#strftime: grow the buffer faster
Use a classic doubling of capacity rather than only adding
twice as much capacity as is already known to be needed.

```
compare-ruby: ruby 3.4.0dev (2024-09-04T09:21:53Z opt-strftime-2 ae98d19cf9) +YJIT [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-09-04T11:46:02Z opt-strftime-growth 586263d6fb) +YJIT [arm64-darwin23]
warming up...

|                            |compare-ruby|built-ruby|
|:---------------------------|-----------:|---------:|
|time.strftime("%FT%T")      |      1.754M|    1.889M|
|                            |           -|     1.08x|
|time.strftime("%FT%T.%3N")  |      1.508M|    1.749M|
|                            |           -|     1.16x|
|time.strftime("%FT%T.%6N")  |      1.488M|    1.756M|
|                            |           -|     1.18x|
compare-ruby: ruby 3.4.0dev (2024-09-04T09:21:53Z opt-strftime-2 ae98d19cf9) +YJIT [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-09-04T09:21:53Z opt-strftime-2 ae98d19cf9) +YJIT [arm64-darwin23]
warming up...
```
2024-09-04 14:52:55 +02:00
Jean Boussier
9594db0cf2 Implement Hash.new(capacity:)
[Feature #19236]

When building a large hash, pre-allocating it with enough
capacity can save many re-hashes and significantly improve
performance.

```
/opt/rubies/3.3.0/bin/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::../miniruby-master -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby --disable-gem" \
	            --output=markdown --output-compare -v $(find ./benchmark -maxdepth 1 -name 'hash_new' -o -name '*hash_new*.yml' -o -name '*hash_new*.rb' | sort)
compare-ruby: ruby 3.4.0dev (2024-03-25T11:48:11Z master f53209f023) +YJIT dev [arm64-darwin23]
last_commit=[ruby/irb] Cache RDoc::RI::Driver.new (https://github.com/ruby/irb/pull/911)
built-ruby: ruby 3.4.0dev (2024-03-25T15:29:40Z hash-new-rb 77652b08a2) +YJIT dev [arm64-darwin23]
warming up...

|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|new                 |      7.614M|    5.976M|
|                    |       1.27x|         -|
|new_with_capa_1k    |     13.931k|   15.698k|
|                    |           -|     1.13x|
|new_with_capa_100k  |     124.746|   148.283|
|                    |           -|     1.19x|
```
2024-07-08 12:24:33 +02:00
Jean Boussier
9e9f1d9301 Precompute embedded string literals hash code
With embedded strings we often have some space left in the slot, which
we can use to store the string Hash code.

It's probably only worth it for string literals, as they are the ones
likely to be used as hash keys.

We chose to store the Hash code right after the string terminator as to
make it easy/fast to compute, and not require one more union in RString.

```
compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code

|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol      |     39.275M|   39.753M|
|            |           -|     1.01x|
|dyn_symbol  |     37.348M|   37.704M|
|            |           -|     1.01x|
|small_lit   |     29.514M|   33.948M|
|            |           -|     1.15x|
|frozen_lit  |     27.180M|   33.056M|
|            |           -|     1.22x|
|iseq_lit    |     27.391M|   32.242M|
|            |           -|     1.18x|
```

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
2024-05-28 07:32:41 +02:00
Aaron Patterson
f86fb1eda2 add allocation benchmark 2024-04-15 11:29:48 -07:00
Takashi Kokubun
70de3b170b
Optimize Hash methods with Kernel#hash (#10160) 2024-03-01 11:16:31 -08:00
Jeremy Evans
f446d68ba6 Add benchmarks for super and zsuper calls of different types
These show gains from the recent optimization commits:

```
                        arg_splat
            miniruby:   7346039.9 i/s
     miniruby-before:   4692240.8 i/s - 1.57x  slower

                  arg_splat_block
            miniruby:   6539749.6 i/s
     miniruby-before:   4358063.6 i/s - 1.50x  slower

                   splat_kw_splat
            miniruby:   5433641.5 i/s
     miniruby-before:   3851048.6 i/s - 1.41x  slower

             splat_kw_splat_block
            miniruby:   4916137.1 i/s
     miniruby-before:   3477090.1 i/s - 1.41x  slower

                   splat_kw_block
            miniruby:   2912829.5 i/s
     miniruby-before:   2465611.7 i/s - 1.18x  slower

                   arg_splat_post
            miniruby:   2195208.2 i/s
     miniruby-before:   1860204.3 i/s - 1.18x  slower
```

zsuper only speeds up in the post argument case, because
it was already set to use splatarray false in cases where
there were no post arguments.
2024-03-01 07:10:25 -08:00
Alan Wu
e4272fd292 Avoid allocation when passing no keywords to anonymous kwrest methods
Thanks to the new semantics from [ruby-core:115808], `**nil` is now
equivalent to `**{}`. Since the only thing one could do with anonymous
keyword rest parameter is to delegate it with `**`, nil is just as good
as an empty hash. Using nil avoids allocating an empty hash.

This is particularly important for `...` methods since they now use
`**kwrest` under the hood after 4f77d8d328. Most calls don't pass
keywords.

    Comparison:
                             fw_no_kw
                    post:   9816800.9 i/s
                     pre:   8570297.0 i/s - 1.15x  slower
2024-02-13 11:05:26 -05:00
Jeremy Evans
c20e819e8b Fix crash when passing large keyword splat to method accepting keywords and keyword splat
The following code previously caused a crash:

```ruby
h = {}
1000000.times{|i| h[i.to_s.to_sym] = i}
def f(kw: 1, **kws) end
f(**h)
```

Inside a thread or fiber, the size of the keyword splat could be much smaller
and still cause a crash.

I found this issue while optimizing method calling by reducing implicit
allocations.  Given the following code:

```ruby
def f(kw: , **kws) end
kw = {kw: 1}
f(**kw)
```

The `f(**kw)` call previously allocated two hashes callee side instead of a
single hash.  This is because `setup_parameters_complex` would extract the
keywords from the keyword splat hash to the C stack, to attempt to mirror
the case when literal keywords are passed without a keyword splat.  Then,
`make_rest_kw_hash` would build a new hash based on the extracted keywords
that weren't used for literal keywords.

Switch the implementation so that if a keyword splat is passed, literal keywords
are deleted from the keyword splat hash (or a copy of the hash if the hash is
not mutable).

In addition to avoiding the crash, this new approach is much more
efficient in all cases.  With the included benchmark:

```
                                1
            miniruby:   5247879.9 i/s
     miniruby-before:   2474050.2 i/s - 2.12x  slower

                        1_mutable
            miniruby:   1797036.5 i/s
     miniruby-before:   1239543.3 i/s - 1.45x  slower

                               10
            miniruby:   1094750.1 i/s
     miniruby-before:    365529.6 i/s - 2.99x  slower

                       10_mutable
            miniruby:    407781.7 i/s
     miniruby-before:    225364.0 i/s - 1.81x  slower

                              100
            miniruby:    100992.3 i/s
     miniruby-before:     32703.6 i/s - 3.09x  slower

                      100_mutable
            miniruby:     40092.3 i/s
     miniruby-before:     21266.9 i/s - 1.89x  slower

                             1000
            miniruby:     21694.2 i/s
     miniruby-before:      4949.8 i/s - 4.38x  slower

                     1000_mutable
            miniruby:      5819.5 i/s
     miniruby-before:      2995.0 i/s - 1.94x  slower
```
2024-02-11 22:48:38 -08:00
Takashi Kokubun
76f0eec20f Fix a benchmark to avoid leaving a garbage file 2024-02-08 17:08:23 -08:00
Jeremy Evans
2217e08340
Optimize compilation of large literal arrays
To avoid stack overflow, Ruby splits compilation of large arrays
into smaller arrays, and concatenates the small arrays together.
It previously used newarray/concatarray for this, which is
inefficient.  This switches the compilation to use pushtoarray,
which is much faster. This makes almost all literal arrays only
allocate a single array.

For cases where there is a large amount of static values in the
array, Ruby will statically compile subarrays, and previously
added them using concatarray.  This switches to concattoarray,
avoiding an array allocation for the append.

Keyword splats are also supported in arrays, and ignored if the
keyword splat is empty.  Previously, this used newarraykwsplat and
concatarray.  This still uses newarraykwsplat, but switches to
concattoarray to save an allocation.  So large arrays with keyword
splats can allocate 2 arrays instead of 1.

Previously, for the following array sizes (assuming local variable
access for each element), Ruby allocated the following number of
arrays:

  1000 elements: 7 arrays
 10000 elements: 79 arrays
100000 elements: 781 arrays

With these changes, only a single array is allocated (or 2 for a
large array with a keyword splat.

Results using the included benchmark:

```
                       array_1000
            miniruby:     34770.0 i/s
   ./miniruby-before:     10511.7 i/s - 3.31x  slower

                      array_10000
            miniruby:      4938.8 i/s
   ./miniruby-before:       483.8 i/s - 10.21x  slower

                     array_100000
            miniruby:       727.2 i/s
   ./miniruby-before:         4.1 i/s - 176.98x  slower
```

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
2024-01-27 10:16:52 -08:00
Jeremy Evans
42d891be2c Add benchmark for implicit array/hash allocation reduction changes
Benchmark results:

```
named_multi_arg_splat
after:    5344097.6 i/s
before:   3088134.0 i/s - 1.73x  slower

named_post_splat
after:    5401882.3 i/s
before:   2629321.8 i/s - 2.05x  slower

anon_arg_splat
after:   12242780.9 i/s
before:   6845413.2 i/s - 1.79x  slower

anon_arg_kw_splat
after:   11277398.7 i/s
before:   4329509.4 i/s - 2.60x  slower

anon_multi_arg_splat
after:    5132699.5 i/s
before:   3018103.7 i/s - 1.70x  slower

anon_post_splat
after:    5602915.1 i/s
before:   2645185.5 i/s - 2.12x  slower

anon_kw_splat
after:   15403727.3 i/s
before:   6249504.6 i/s - 2.46x  slower

anon_fw_to_named_splat
after:    2985715.3 i/s
before:   2049159.9 i/s - 1.46x  slower

anon_fw_to_named_no_splat
after:    2941030.4 i/s
before:   2100380.0 i/s - 1.40x  slower

fw_to_named_splat
after:    2801008.7 i/s
before:   2012416.4 i/s - 1.39x  slower

fw_to_named_no_splat
after:    2742670.4 i/s
before:   1957707.2 i/s - 1.40x  slower

fw_to_anon_to_named_splat
after:    2309246.6 i/s
before:   1375924.6 i/s - 1.68x  slower

fw_to_anon_to_named_no_splat
after:    2193227.6 i/s
before:   1351184.1 i/s - 1.62x  slower
```
2024-01-24 18:25:55 -08:00
Takashi Kokubun
c84237f953
Rewrite Array#each in Ruby using Primitive (#9533) 2024-01-23 20:09:57 +00:00