archive/ruby - Eplg Git: Free And Private Git Hosting

mirror of https://github.com/ruby/ruby.git synced 2025-08-15 13:39:04 +02:00

Author	SHA1	Message	Date
Nobuyoshi Nakada	e433e6515e	[DOC] Exclude 'Class' and 'Module' from RDoc's autolinking	2025-01-02 12:36:06 +09:00
Alan Wu	880a90cf2e	[DOC] [Feature #20205 ] Document the new power of String#+@	2024-12-13 14:25:32 -05:00
Jean Boussier	26d020cb6e	Optimize `rb_must_asciicompat` While profiling `strscan`, I noticed `rb_must_asciicompat` was quite slow, as more than 5% of the benchmark was spent in it: https://share.firefox.dev/49bOcTn By checking for the common 3 ASCII compatible encoding index first, we can skip a lot of expensive operations in the happy path.	2024-11-27 14:50:07 +01:00
Nobuyoshi Nakada	6b4f8945d6	Many of Oniguruma functions need valid encoding strings	2024-11-26 11:46:34 +09:00
Nobuyoshi Nakada	02b70256b5	Check negative integer underflow	2024-11-26 11:46:34 +09:00
Matt Valentine-House	551be8219e	Place all non-default GC API behind USE_SHARED_GC So that it doesn't get included in the generated binaries for builds that don't support loading shared GC modules Co-Authored-By: Peter Zhu <peter@peterzhu.ca>	2024-11-25 13:05:23 +00:00
Peter Zhu	41a9460227	[DOC] Fix typo in comment for STR_PRECOMPUTED_HASH	2024-11-20 11:16:10 -05:00
Kouhei Yanagita	eb2b0c2a0d	[DOC] Fix the default `limit` of String#split We can't pass `nil` as the second parameter of `String#split`. Therefore, descriptions like "if limit is nil, ..." are not appropriate.	2024-11-19 12:15:48 +09:00
Randy Stauner	beafae9750	YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments (#12069 ) * YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments String#[] is in the top few C calls of several YJIT benchmarks: liquid-compile rubocop mail sudoku This speeds up these benchmarks by 1-2%. * YJIT: Try harder to get type info for `String#[]` In the large generated code of the mail gem the context doesn't have the type info. In that case if we peek at the stack and add a guard we can still apply the specialization and it speeds up the mail benchmark by 5%. Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> --------- Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com>	2024-11-13 12:25:09 -05:00
Jean byroot Boussier	6deeec5d45	Mark strings returned by Symbol#to_s as chilled (#12065 ) * Use FL_USER0 for ELTS_SHARED This makes space in RString for two bits for chilled strings. * Mark strings returned by `Symbol#to_s` as chilled [Feature #20350] `STR_CHILLED` now spans on two user flags. If one bit is set it marks a chilled string literal, if it's the other it marks a `Symbol#to_s` chilled string. Since it's not possible, and doesn't make much sense to include debug info when `--debug-frozen-string-literal` is set, we can't include allocation source, but we can safely include the symbol name in the warning message, making it much easier to find the source of the issue. Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com> --------- Co-authored-by: Étienne Barrié <etienne.barrie@gmail.com> Co-authored-by: Jean Boussier <jean.boussier@gmail.com>	2024-11-13 09:20:00 -05:00
Jean Boussier	37a16c7812	string.c: preserve coderange when interning a string Since `str_do_hash` will most likely scan the string to compute the coderange, we might as well copy it over in the interned string in case it's useful later.	2024-11-13 14:14:24 +01:00
Jean Boussier	fae86a701e	string.c: Directly create strings with the correct encoding While profiling msgpack-ruby I noticed a very substantial amout of time spent in `rb_enc_associate_index`, called by `rb_utf8_str_new`. On that benchmark, `rb_utf8_str_new` is 33% of the total runtime, in big part because it cause GC to trigger often, but even then `5.3%` of the total runtime is spent in `rb_enc_associate_index` called by `rb_utf8_str_new`. After closer inspection, it appears that it's performing a lot of safety check we can assert we don't need, and other extra useless operations, because strings are first created and filled as ASCII-8BIT and then later reassociated to the desired encoding. By directly allocating the string with the right encoding, it allow to skip a lot of duplicated and useless operations. After this change, the time spent in `rb_utf8_str_new` is down to `28.4%` of total runtime, and most of that is GC.	2024-11-13 13:32:32 +01:00
Jean Boussier	bfb4783c01	Move `Symbol#name` into `symbol.rb` This allows to declare it as leaf just like `Symbol#to_s`. Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>	2024-11-13 10:29:07 +01:00
Étienne Barrié	84a8b911c1	Store precomputed hash when there's capacity Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-11-06 12:57:17 +01:00
Étienne Barrié	1e037108a1	Precompute hash only once when interning string literals When a fake string is interned, use the capa field to store the string hash. This lets us compute it once for hash lookup and embedding the hash in the interned string. Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-11-04 14:37:14 +01:00
Yusuke Endoh	a83c91dd7a	Fix an off-by-one error of own memrchr implementation and make it support `search_len == 0`, just for the case Ref [Bug #20796]	2024-10-21 20:40:42 +09:00
Étienne Barrié	257f78fb67	Show where mutated chilled strings were allocated [Feature #20205] The warning now suggests running with --debug-frozen-string-literal: ``` test.rb:3: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information) ``` When using --debug-frozen-string-literal, the location where the string was created is shown: ``` test.rb:3: warning: literal string will be frozen in the future test.rb:1: info: the string was created here ``` When resurrecting strings and debug mode is not enabled, the overhead is a simple FL_TEST_RAW. When mutating chilled strings and deprecation warnings are not enabled, the overhead is a simple warning category enabled check. Co-authored-by: Jean Boussier <byroot@ruby-lang.org> Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org> Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-10-21 12:33:02 +02:00
Holger Just	7081838d2a	[DOC] String#sub! and String#gsub! return nil if no replacement occured	2024-10-07 17:20:03 +09:00
Peter Zhu	e956ce32c8	Use rb_bug instead of UNREACHABLE for assertions UNREACHABLE uses __builtin_unreachable which is not intended to be used as an assertion.	2024-09-24 14:54:55 -04:00
Peter Zhu	c51d8ff458	Fix undefined behavior in String#append_as_bytes The UNREACHABLE macro calls __builtin_unreachable, which according to the [GCC docs](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005funreachable): > If control flow reaches the point of the __builtin_unreachable, the > program is undefined. But it can reach this point with the following script: "123".append_as_bytes("123") This can crash on some platforms with a `Trace/BPT trap: 5`.	2024-09-24 14:54:55 -04:00
Jeremy Evans	8dc0d2904a	Update exception message in string_for_symbol This is a static function only called in two places (rb_to_id and rb_to_symbol), and in both places, both symbols and strings are allowed. This makes the error message consistent with rb_check_id and rb_check_symbol. Fixes [Bug #20607]	2024-09-18 21:29:07 -07:00
Jean Boussier	16f241f0aa	Implement String#append_as_bytes(String \| Integer, ...) [Feature #20594] A handy method to construct a string out of multiple chunks. Contrary to `String#concat`, it doesn't do any encoding negociation, and simply append the content as bytes regardless of whether this result in a broken string or not. It's the caller responsibility to check for `String#valid_encoding?` in cases where it's needed. When passed integers, only the lower byte is considered, like in `String#setbyte`.	2024-09-09 15:04:51 +02:00
Jean Boussier	036ca726bb	Fix documentation for String#index and String#byterindex	2024-09-04 11:26:17 +02:00
Nobuyoshi Nakada	ade240e578	Adjust indents [ci skip]	2024-09-04 10:28:52 +09:00
Jean Boussier	b7fa2dd0d0	rb_enc_str_asciionly_p: avoid always fetching the encoding Profiling of `JSON.dump` shows a significant amount of time spent in `rb_enc_str_asciionly_p`, in large part because it fetches the encoding. It can be made twice as fast in this scenario by first checking the coderange and only falling back to fetching the encoding if the coderange is unknown. Additionally we can skip fetching the encoding for the common popular encodings.	2024-09-03 12:21:36 +02:00
Zack Deveau	e7cb70be4e	Improve String#rindex performance on OSX On OSX, String#rindex is slow due to the lack of `memrchr`. The fallback implementation finds a match by instead doing a `memcmp` on every single character in the search string looking for a substring match. For OSX hosts, this changeset introduces a simple `memrchr` implementation, `rb_memrchr`, that can be used instead. An example benchmark below demonstrates an 8000 char long search string with a 10 char substring near the end. ``` ruby-master \| substring near the end \| osx UTF-8 user system total real index 0.000111 0.000000 0.000111 ( 0.000110) rindex 0.000446 0.000005 0.000451 ( 0.000454) ``` ``` ruby-patched \| substring near the end \| osx UTF-8 user system total real index 0.000112 0.000000 0.000112 ( 0.000111) rindex 0.000057 0.000001 0.000058 ( 0.000057) ```	2024-09-03 14:25:25 +09:00
Jean Boussier	4e85b6b4c4	rb_str_bytesplice: skip encoding check if encodings are the same If both strings have the same encoding, all this work is useless.	2024-08-09 22:06:44 +02:00
Jean Boussier	3bac5f6af5	string.c: add fastpath in str_ensure_byte_pos If the string only contain single byte characters we can skips all the costly checks.	2024-08-09 22:06:44 +02:00
Jean Boussier	a332367dad	string.c: Add fastpath to single_byte_optimizable `rb_enc_from_index` is a costly operation so it is worth avoiding to call it for the common encodings. Also in the case of UTF-8, it's more efficient to scan the coderange if it is unknown that to fallback to the slower algorithms.	2024-08-09 22:06:44 +02:00
Jean Boussier	2bd5dc47ac	string.c: str_capacity don't check for immediates `STR_EMBED_P` uses `FL_TEST_RAW` meaning we already assume `str` isn't an immediate, so we can use `FL_TEST_RAW` here too.	2024-08-09 15:20:58 +02:00
Jean Boussier	af44af238b	str_independent: add a fastpath with a single flag check If we assume that most strings we modify are not frozen and are independent, then we can optimize this case by replacing multiple flag checks by a single mask check.	2024-08-09 15:20:58 +02:00
Kevin Menard	04a6165ac0	YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values. (#11032 ) * Document why we need to explicitly spill registers. * Simplify passing a byte value to `str_buf_cat`. * YJIT: Enhance the `String#<<` method substitution to handle integer codepoint values. * YJIT: Move runtime type check into YJIT. Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum.	2024-08-02 15:45:22 -04:00
Jean Boussier	83f57ca3d2	String.new(capacity:) don't substract termlen [Bug #20585] This was changed in `36a06efdd9` because `String.new(1024)` would end up allocating `1025` bytes, but the problem with this change is that the caller may be trying to right size a String. So instead, we should just better document the behavior of `capacity:`.	2024-06-19 15:11:07 +02:00
Kevin Menard	a119b5f879	Add a fast path implementation for appending single byte values to US-ASCII strings.	2024-06-17 09:44:48 -07:00
Kevin Menard	27e13fbc58	Add a fast path implementation for appending single byte values to binary strings. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2024-06-17 09:44:48 -07:00
Alan Wu	6416ee33eb	Simplify unaligned write for pre-computed string hash	2024-06-13 18:52:09 -04:00
Alan Wu	a8730adb60	rb_str_hash(): Avoid UB with making misaligned pointer Previously, on common platforms, this code made a pointer to a union of 8 byte alignment out of a char pointer that is not guaranteed to satisfy the alignment requirement. That is undefined behavior according to [C99 6.3.2.3p7](https://port70.net/~nsz/c/c99/n1256.html#6.3.2.3p7). Use memcpy() to do the unaligned read instead.	2024-06-13 18:52:09 -04:00
tompng	a9b8981aac	Simplify rb_str_resize clear range condition	2024-06-13 18:27:02 +02:00
tompng	9c7374b0e6	Clear coderange when rb_str_resize change size In some encoding like utf-16 utf-32, expanding the string with null bytes can change coderange to either broken or valid.	2024-06-13 18:27:02 +02:00
Nobuyoshi Nakada	dd8903fed7	[Bug #20566 ] Mention out-of-range argument cases in `String#<<` Also [Bug #18973].	2024-06-09 10:11:06 +09:00
Jean Boussier	730e3b2ce0	Stop exposing `rb_str_chilled_p` [Feature #20205] Now that chilled strings no longer appear as frozen, there is no need to offer an API to check for chilled strings. We however need to change `rb_check_frozen_internal` to no longer be a macro, as it needs to check for chilled strings.	2024-06-02 13:53:35 +02:00
Nobuyoshi Nakada	7d144781a9	[Bug #20512 ] Set coderange in `Range#each` of strings	2024-05-28 16:59:51 +09:00
Nobuyoshi Nakada	0a92c9f2b0	Set empty strings to ASCII-only	2024-05-28 16:24:21 +09:00
Jean Boussier	9e9f1d9301	Precompute embedded string literals hash code With embedded strings we often have some space left in the slot, which we can use to store the string Hash code. It's probably only worth it for string literals, as they are the ones likely to be used as hash keys. We chose to store the Hash code right after the string terminator as to make it easy/fast to compute, and not require one more union in RString. ``` compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main `f77618c1fa`) [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23] last_commit=Precompute embedded string literals hash code \| \|compare-ruby\|built-ruby\| \|:-----------\|-----------:\|---------:\| \|symbol \| 39.275M\| 39.753M\| \| \| -\| 1.01x\| \|dyn_symbol \| 37.348M\| 37.704M\| \| \| -\| 1.01x\| \|small_lit \| 29.514M\| 33.948M\| \| \| -\| 1.15x\| \|frozen_lit \| 27.180M\| 33.056M\| \| \| -\| 1.22x\| \|iseq_lit \| 27.391M\| 32.242M\| \| \| -\| 1.18x\| ``` Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>	2024-05-28 07:32:41 +02:00
Étienne Barrié	1376881e9a	Stop marking chilled strings as frozen They were initially made frozen to avoid false positives for cases such as: str = str.dup if str.frozen? But this may cause bugs and is generally confusing for users. [Feature #20205] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-05-28 07:32:33 +02:00
Jean Boussier	3a7846b1aa	Add a hint of `ASCII-8BIT` being `BINARY` [Feature #18576] Since outright renaming `ASCII-8BIT` is deemed to backward incompatible, the next best thing would be to only change its `#inspect`, particularly in exception messages.	2024-04-18 10:17:26 +02:00
Jean Boussier	f06670c5a2	Eliminate usage of OBJ_FREEZE_RAW Previously it would bypass the `FL_ABLE` check, but since shapes introduction, it started having a different behavior than `OBJ_FREEZE`, as it would onyl set the `FL_FREEZE` flag, but not update the shape. I have no indication of this causing a bug yet, but it seems like a trap waiting to happen.	2024-04-16 17:20:35 +02:00
Étienne Barrié	49b31c7680	Document STR_CHILLED flag on RString [Feature #20205]	2024-04-08 13:25:09 +02:00
Nobuyoshi Nakada	4dd9e5cf74	Add builtin type assertion	2024-04-08 11:13:29 +09:00
Peter Zhu	e50590a541	Assert that Symbol#inspect returns a T_STRING	2024-04-05 16:15:28 -04:00

1 2 3 4 5 ...

1915 commits