This commit provides an alternative implementation for a
long → decimal conversion.
The main difference is that it uses an algorithm pulled from
https://github.com/jeaiii/itoa.
The source there is C++, it was converted by hand to C for
inclusion with this gem.
jeaiii's algorithm is covered by the MIT License, see source code.
On addition this version now also generates the string directly into
the fbuffer, foregoing the need to run a separate memory copy.
As a result, I see a speedup of 32% on Apple Silicon M1 for an
integer set of benchmarks.
This commit provides an alternative implementation for a float → decimal conversion.
It integrates a C implementation of Fabian Loitsch's Grisu-algorithm [[pdf]](http://florian.loitsch.com/publications/dtoa-pldi2010.pdf), extracted from https://github.com/night-shift/fpconv. The relevant files are added in this PR, they are, as is all of https://github.com/night-shift/fpconv, available under a MIT License.
As a result, I see a speedup of 900% on Apple Silicon M1 for a float set of benchmarks.
floats don't have a single correct string representation: a float like `1000.0` can be represented as "1000", "1e3", "1000.0" (and more).
The Grisu algorithm converts floating point numbers to an optimal decimal string representation without loss of precision. As a result, a float that is exactly an integer (like `Float(10)`) will be converted by that algorithm into `"10"`. While technically correct – the JSON format treats floats and integers identically –, this differs from the current behaviour of the `"json"` gem. To address this, the integration checks for that case, and explicitely adds a ".0" suffix in those cases.
This is sufficient to meet all existing tests; there is, however, a chance that the current implementation and this implementation occasionally encode floats differently.
```
== Encoding floats (4179311 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
json (local) 4.000 i/100ms
Calculating -------------------------------------
json (local) 46.046 (± 2.2%) i/s (21.72 ms/i) - 232.000 in 5.039611s
Normalize to 2090234 byte
== Encoding floats (4179242 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
json (2.10.2) 1.000 i/100ms
Calculating -------------------------------------
json (2.10.2) 4.614 (± 0.0%) i/s (216.74 ms/i) - 24.000 in 5.201871s
```
These benchmarks are run via a script ([link](https://gist.github.com/radiospiel/04019402726a28b31616df3d0c17bd1c)) which is based on the gem's `benchmark/encoder.rb` file. There are probably better ways to run benchmarks :) My version allows to combine multiple test cases into a single one.
The `dumps` benchmark, which covers the JSON files in `benchmark/data/*.json` – with the exception of `canada.json` – , reported a minor speedup within statistical uncertainty.
7d77415108
Fix: https://github.com/ruby/json/issues/748
`MultiJson` pass `State#to_h` as options, and the `as_json`
property defaults to `false` but `false` wasn't accepted by
the constructor.
GCC 13 prints the following warning.
20241127T001003Z.log.html.gz
```
compiling generator.c
generator.c: In function ‘raise_generator_error’:
generator.c:91:5: warning: function ‘raise_generator_error’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
91 | VALUE str = rb_vsprintf(fmt, args);
| ^~~~~
```
This change prevents the warning by specifying the format attribute.
b8c1490846
Fix: https://github.com/ruby/json/issues/710
Makes it easier to debug why a given tree of objects can't
be dumped as JSON.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
Ref: https://github.com/ruby/json/issues/524
Rather than to buffer everything in memory.
Unfortunately Ruby doesn't provide an API to write into
and IO without first allocating a string, which is a bit
wasteful.
f017af6c0a
Ignoring `CHAR_BITS` > 8 platform, as far as `ch` indexes
`escape_table` that is hard-coded as 256 elements.
```
../../../../src/ext/json/generator/generator.c(121): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(122): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(243): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(244): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(291): warning C4333: '>>': right shift by too large amount, data loss
../../../../src/ext/json/generator/generator.c(292): warning C4333: '>>': right shift by too large amount, data loss
```
fb82373612
Ref: https://github.com/ruby/json/pull/674
Ref: https://github.com/ruby/json/pull/668
The behavior on such case it quite unclear, the goal here is to
figure out whatever was the behavior on Cext version of `json 2.7.0`
and get all implementations to converge.
We can then decide to make them all behave differently if we so wish.
614921dcef
Fix: https://github.com/ruby/json/issues/667
This is yet another behavior on which the various implementations
differed, but the C implementation used to call `to_json` on String
subclasses used as keys.
This was optimized out in e125072130229e54a651f7b11d7d5a782ae7fb65
but there is an Active Support test case for it, so it's best to
make all 3 implementation respect this behavior.
Given we expect these to almost always be null, we might as
well keep them in RString.
And even when provided, assuming we're passed frozen strings
we'll save on copying them.
This also reduce the size of the struct from 112B to 72B.
6382c231b0
Ref: https://github.com/ruby/json/issues/655
Followup: https://github.com/ruby/json/issues/657
Assuming the generator might be used for fairly small documents
we can start with a reasonable buffer size of the stack, and if
we outgrow it, we can spill on the heap.
In a way this is optimizing for micro-benchmarks, but there are
valid use case for fiarly small JSON document in actual real world
scenarios, so trashing the GC less in such case make sense.
Before:
```
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
Oj 518.700k i/100ms
JSON reuse 483.370k i/100ms
Calculating -------------------------------------
Oj 5.722M (± 1.8%) i/s (174.76 ns/i) - 29.047M in 5.077823s
JSON reuse 5.278M (± 1.5%) i/s (189.46 ns/i) - 26.585M in 5.038172s
Comparison:
Oj: 5722283.8 i/s
JSON reuse: 5278061.7 i/s - 1.08x slower
```
After:
```
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
Oj 517.837k i/100ms
JSON reuse 548.871k i/100ms
Calculating -------------------------------------
Oj 5.693M (± 1.6%) i/s (175.65 ns/i) - 28.481M in 5.004056s
JSON reuse 5.855M (± 1.2%) i/s (170.80 ns/i) - 29.639M in 5.063004s
Comparison:
Oj: 5692985.6 i/s
JSON reuse: 5854857.9 i/s - 1.03x faster
```
fe607f4806
Fix: https://github.com/ruby/json/issues/646
Since both `json` and `json_pure` expose the same files, if the
versions don't match, the native extension may be loaded with Ruby
code that don't match and is incompatible.
By doing the `require json/ext/generator/state` from C we ensure
we're at least loading that.
But this is a dirty workaround for the 2.7.x branch, we should
find a better way to fully isolate the two gems.
dfdd4acf36
Ref: https://github.com/ruby/json/issues/647
Ref: https://github.com/rubygems/rubygems/pull/6490
Older rubygems are executing `extconf.rb` with a broken `$LOAD_PATH`
causing the `json` gem native extension to be loaded with the stdlib
version of the `.rb` files.
This fails with
```
json/common.rb:82:in `initialize': wrong number of arguments (given 1, expected 0) (ArgumentError)
```
Since this is just for `extconf.rb` we can probably just accept that
extra argument and ignore it.
The bug was fixed in rubygems 3.4.9 / 2023-03-20
1f5e849fe0
Profiling revealed that we were spending lots of time growing the buffer.
Buffer operations is definitely something we want to optimize, but for
this specific benchmark what we're interested in is UTF-8 scanning performance.
Each iteration of the two scaning benchmark were producing 20MB of JSON,
now they only produce 5MB.
Now:
```
== Encoding mostly utf8 (5001001 bytes)
ruby 3.4.0dev (2024-10-18T19:01:45Z master 7be9a333ca) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
json 35.000 i/100ms
oj 36.000 i/100ms
rapidjson 10.000 i/100ms
Calculating -------------------------------------
json 359.161 (± 1.4%) i/s (2.78 ms/i) - 1.820k in 5.068542s
oj 359.699 (± 0.6%) i/s (2.78 ms/i) - 1.800k in 5.004291s
rapidjson 99.687 (± 2.0%) i/s (10.03 ms/i) - 500.000 in 5.017321s
Comparison:
json: 359.2 i/s
oj: 359.7 i/s - same-ish: difference falls within error
rapidjson: 99.7 i/s - 3.60x slower
```
1a338532d2