While profiling `strscan`, I noticed `rb_must_asciicompat` was quite
slow, as more than 5% of the benchmark was spent in it: https://share.firefox.dev/49bOcTn
By checking for the common 3 ASCII compatible encoding index first,
we can skip a lot of expensive operations in the happy path.
So that it doesn't get included in the generated binaries for builds
that don't support loading shared GC modules
Co-Authored-By: Peter Zhu <peter@peterzhu.ca>
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments
String#[] is in the top few C calls of several YJIT benchmarks:
liquid-compile rubocop mail sudoku
This speeds up these benchmarks by 1-2%.
* YJIT: Try harder to get type info for `String#[]`
In the large generated code of the mail gem the context doesn't have
the type info. In that case if we peek at the stack and add a guard
we can still apply the specialization
and it speeds up the mail benchmark by 5%.
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
---------
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>
* Use FL_USER0 for ELTS_SHARED
This makes space in RString for two bits for chilled strings.
* Mark strings returned by `Symbol#to_s` as chilled
[Feature #20350]
`STR_CHILLED` now spans on two user flags. If one bit is set it
marks a chilled string literal, if it's the other it marks a
`Symbol#to_s` chilled string.
Since it's not possible, and doesn't make much sense to include
debug info when `--debug-frozen-string-literal` is set, we can't
include allocation source, but we can safely include the symbol
name in the warning message, making it much easier to find the source
of the issue.
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
---------
Co-authored-by: Étienne Barrié <etienne.barrie@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Since `str_do_hash` will most likely scan the string to
compute the coderange, we might as well copy it over in the
interned string in case it's useful later.
While profiling msgpack-ruby I noticed a very substantial amout of time
spent in `rb_enc_associate_index`, called by `rb_utf8_str_new`.
On that benchmark, `rb_utf8_str_new` is 33% of the total runtime,
in big part because it cause GC to trigger often, but even then
`5.3%` of the total runtime is spent in `rb_enc_associate_index`
called by `rb_utf8_str_new`.
After closer inspection, it appears that it's performing a lot of
safety check we can assert we don't need, and other extra useless
operations, because strings are first created and filled as ASCII-8BIT
and then later reassociated to the desired encoding.
By directly allocating the string with the right encoding, it allow
to skip a lot of duplicated and useless operations.
After this change, the time spent in `rb_utf8_str_new` is down
to `28.4%` of total runtime, and most of that is GC.
When a fake string is interned, use the capa field to store the string
hash. This lets us compute it once for hash lookup and embedding the
hash in the interned string.
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
[Feature #20205]
The warning now suggests running with --debug-frozen-string-literal:
```
test.rb:3: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information)
```
When using --debug-frozen-string-literal, the location where the string
was created is shown:
```
test.rb:3: warning: literal string will be frozen in the future
test.rb:1: info: the string was created here
```
When resurrecting strings and debug mode is not enabled, the overhead is a simple FL_TEST_RAW.
When mutating chilled strings and deprecation warnings are not enabled,
the overhead is a simple warning category enabled check.
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
The UNREACHABLE macro calls __builtin_unreachable, which according to
the [GCC docs](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005funreachable):
> If control flow reaches the point of the __builtin_unreachable, the
> program is undefined.
But it can reach this point with the following script:
"123".append_as_bytes("123")
This can crash on some platforms with a `Trace/BPT trap: 5`.
This is a static function only called in two places (rb_to_id and
rb_to_symbol), and in both places, both symbols and strings are
allowed. This makes the error message consistent with rb_check_id
and rb_check_symbol.
Fixes [Bug #20607]
[Feature #20594]
A handy method to construct a string out of multiple chunks.
Contrary to `String#concat`, it doesn't do any encoding negociation,
and simply append the content as bytes regardless of whether this
result in a broken string or not.
It's the caller responsibility to check for `String#valid_encoding?`
in cases where it's needed.
When passed integers, only the lower byte is considered, like in
`String#setbyte`.
Profiling of `JSON.dump` shows a significant amount of time spent
in `rb_enc_str_asciionly_p`, in large part because it fetches the
encoding.
It can be made twice as fast in this scenario by first checking the
coderange and only falling back to fetching the encoding if the
coderange is unknown.
Additionally we can skip fetching the encoding for the common
popular encodings.
On OSX, String#rindex is slow due to the lack of `memrchr`.
The fallback implementation finds a match by instead doing
a `memcmp` on every single character in the search string
looking for a substring match.
For OSX hosts, this changeset introduces a simple `memrchr`
implementation, `rb_memrchr`, that can be used instead. An
example benchmark below demonstrates an 8000 char long
search string with a 10 char substring near the end.
```
ruby-master | substring near the end | osx
UTF-8
user system total real
index 0.000111 0.000000 0.000111 ( 0.000110)
rindex 0.000446 0.000005 0.000451 ( 0.000454)
```
```
ruby-patched | substring near the end | osx
UTF-8
user system total real
index 0.000112 0.000000 0.000112 ( 0.000111)
rindex 0.000057 0.000001 0.000058 ( 0.000057)
```
`rb_enc_from_index` is a costly operation so it is worth avoiding
to call it for the common encodings.
Also in the case of UTF-8, it's more efficient to scan the
coderange if it is unknown that to fallback to the slower
algorithms.
If we assume that most strings we modify are not frozen and
are independent, then we can optimize this case by replacing
multiple flag checks by a single mask check.
* Document why we need to explicitly spill registers.
* Simplify passing a byte value to `str_buf_cat`.
* YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values.
* YJIT: Move runtime type check into YJIT.
Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum.
[Bug #20585]
This was changed in 36a06efdd9 because
`String.new(1024)` would end up allocating `1025` bytes, but the problem
with this change is that the caller may be trying to right size a String.
So instead, we should just better document the behavior of `capacity:`.
Previously, on common platforms, this code made a pointer to a union of
8 byte alignment out of a char pointer that is not guaranteed to satisfy
the alignment requirement. That is undefined behavior according
to [C99 6.3.2.3p7](https://port70.net/~nsz/c/c99/n1256.html#6.3.2.3p7).
Use memcpy() to do the unaligned read instead.
[Feature #20205]
Now that chilled strings no longer appear as frozen, there is no
need to offer an API to check for chilled strings.
We however need to change `rb_check_frozen_internal` to no
longer be a macro, as it needs to check for chilled strings.
With embedded strings we often have some space left in the slot, which
we can use to store the string Hash code.
It's probably only worth it for string literals, as they are the ones
likely to be used as hash keys.
We chose to store the Hash code right after the string terminator as to
make it easy/fast to compute, and not require one more union in RString.
```
compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code
| |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol | 39.275M| 39.753M|
| | -| 1.01x|
|dyn_symbol | 37.348M| 37.704M|
| | -| 1.01x|
|small_lit | 29.514M| 33.948M|
| | -| 1.15x|
|frozen_lit | 27.180M| 33.056M|
| | -| 1.22x|
|iseq_lit | 27.391M| 32.242M|
| | -| 1.18x|
```
Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
They were initially made frozen to avoid false positives for cases such
as:
str = str.dup if str.frozen?
But this may cause bugs and is generally confusing for users.
[Feature #20205]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
[Feature #18576]
Since outright renaming `ASCII-8BIT` is deemed to backward incompatible,
the next best thing would be to only change its `#inspect`, particularly
in exception messages.
Previously it would bypass the `FL_ABLE` check, but
since shapes introduction, it started having a different
behavior than `OBJ_FREEZE`, as it would onyl set the `FL_FREEZE`
flag, but not update the shape.
I have no indication of this causing a bug yet, but it seems
like a trap waiting to happen.