archive/ruby - Eplg Git: Free And Private Git Hosting

mirror of https://github.com/ruby/ruby.git synced 2025-08-15 13:39:04 +02:00

Author	SHA1	Message	Date
Nobuyoshi Nakada	e26e8423b5	Suppress gcc 15 unterminated-string-initialization warnings	2025-07-24 14:39:20 +09:00
nagachika	937319126a	merge revision(s) `02b70256b5`, `6b4f8945d6`: [Backport #20909 ] Check negative integer underflow Many of Oniguruma functions need valid encoding strings	2024-11-30 18:34:32 +09:00
nagachika	a6b7aad954	merge revision(s) `7e4b1f8e19`: [Backport #20322 ] [Bug #20322] Fix rb_enc_interned_str_cstr null encoding The documentation for `rb_enc_interned_str_cstr` notes that `enc` can be a null pointer, but this currently causes a segmentation fault when trying to autoload the encoding. This commit fixes the issue by checking for NULL before calling `rb_enc_autoload`.	2024-07-15 13:40:01 +09:00
nagachika	0cb1e753ca	Revert "merge revision(s) `5e0c171451`: [Backport #20169 ]" This reverts commit `6b73406833`.	2024-07-15 11:55:41 +09:00
nagachika	b5e554d03a	Revert "merge revision(s) `e04146129e`, `d5080f6e8b`: [Backport #20292 ]" This reverts commit `a54c717c7a`.	2024-07-15 11:08:50 +09:00
nagachika	8051a6d385	Revert "follow-up for `a54c717c7a`." This reverts commit `715633ba6e`.	2024-07-15 11:07:31 +09:00
nagachika	715633ba6e	follow-up for `a54c717c7a`.	2024-07-15 10:41:21 +09:00
nagachika	a54c717c7a	merge revision(s) `e04146129e`, `d5080f6e8b`: [Backport #20292 ] [Bug #20292] Truncate embedded string to new capacity Fix -Wsign-compare on String#initialize MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ../string.c:1886:57: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘long int’ [-Wsign-compare] 1886 \| if (STR_EMBED_P(str)) RUBY_ASSERT(osize <= str_embed_capa(str)); \| ^~	2024-07-15 09:26:25 +09:00
nagachika	6b73406833	merge revision(s) `5e0c171451`: [Backport #20169 ] Make io_fwrite safe for compaction [Bug #20169] Embedded strings are not safe for system calls without the GVL because compaction can cause pages to be locked causing the operation to fail with EFAULT. This commit changes io_fwrite to use rb_str_tmp_frozen_no_embed_acquire, which guarantees that the return string is not embedded.	2024-07-15 08:50:38 +09:00
Jean Boussier	449899b383	Fix `String#index` to clear MatchData when a regexp is passed [Bug #20421] The bug was fixed in Ruby 3.3 via `9dcdffb8bf`	2024-05-14 09:29:21 +02:00
nagachika	4f3ed07d5b	merge revision(s) `ade56737e2`: [Backport #20190 ] Fix coderange of invalid_encoding_string.<<(ord) Appending valid encoding character can change coderange from invalid to valid. Example: "\x95".force_encoding('sjis')<<0x5C will be a valid string "\x{955C}" --- string.c \| 6 +++++- test/ruby/test_string.rb \| 3 +++ 2 files changed, 8 insertions(+), 1 deletion(-)	2024-03-31 17:18:55 +09:00
nagachika	b4f8623441	merge revision(s) `b3d6128049`: [Backport #20150 ] Fix memory leak in grapheme clusters [Bug #20150] String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed. For example: str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end Before: 26000 42256 59008 75792 92528 109232 125936 142672 159392 176160 After: 9264 9504 9808 10000 10128 10224 10352 10544 10704 10896 --- string.c \| 98 +++++++++++++++++++++++++++++++----------------- test/ruby/test_string.rb \| 11 ++++++ 2 files changed, 75 insertions(+), 34 deletions(-)	2024-01-18 11:50:31 +09:00
nagachika	ddbab4f837	merge revision(s) `6b66b5fded`: [Backport #19902 ] [Bug #19902] Update the coderange regarding the changed region --- ext/-test-/string/set_len.c \| 10 ++++++++++ string.c \| 27 +++++++++++++++++++++++++++ test/-ext-/string/test_set_len.rb \| 29 +++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+)	2023-09-30 13:51:18 +09:00
nagachika	d30781db4d	merge revision(s) `2214bcb70d`: [Backport #19792 ] Fix premature string collection during append Previously, the following crashed due to use-after-free with AArch64 Alpine Linux 3.18.3 (aarch64-linux-musl): ```ruby str = 'a' * (3210241024) p({z: str}) ``` 32 MiB is the default for `GC_MALLOC_LIMIT_MAX`, and the crash could be dodged by setting `RUBY_GC_MALLOC_LIMIT_MAX` to large values. Under a debugger, one can see the `str2` of rb_str_buf_append() getting prematurely collected while str_buf_cat4() allocates capacity. Add GC guards so the buffer of `str2` lives across the GC run initiated in str_buf_cat4(). [Bug #19792] --- string.c \| 2 ++ 1 file changed, 2 insertions(+)	2023-09-30 13:07:35 +09:00
nagachika	65d294ad01	merge revision(s) `bc3ac1872e`: [Backport #19748 ] [Bug #19748] Fix out-of-bound access in `String#byteindex` --- string.c \| 17 +++++++---------- test/ruby/test_string.rb \| 3 +++ 2 files changed, 10 insertions(+), 10 deletions(-)	2023-07-22 13:39:44 +09:00
NARUSE, Yui	b309c246ee	merge revision(s) `d78ae78fd7`: [Backport #19468 ] rb_str_modify_expand: clear the string coderange [Bug #19468] `b0b9f7201a` errornously stopped clearing the coderange. Since `rb_str_modify` clears it, `rb_str_modify_expand` should too. --- string.c \| 1 + 1 file changed, 1 insertion(+)	2023-03-17 10:56:18 +09:00
NARUSE, Yui	40e0b1e123	merge revision(s) `9726736006`: [Backport #19327 ] Set STR_SHARED_ROOT flag on root of string --- string.c \| 1 + 1 file changed, 1 insertion(+)	2023-01-31 23:46:50 +09:00
NARUSE, Yui	373e62248c	merge revision(s) `f7b72462aa`: [Backport #19356 ] String#bytesplice should return self In Feature #19314, we concluded that the return value of String#bytesplice should be changed from the source string to the receiver, because the source string is useless and confusing when extra arguments are added. This change should be included in Ruby 3.2.1. --- string.c \| 4 ++-- test/ruby/test_string.rb \| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)	2023-01-20 12:24:24 +09:00
NARUSE, Yui	6a8fcb5021	merge revision(s) `3be2acfafd`: [Backport #19327 ] Fix re-embedding of strings during compaction The reference updating code for strings is not re-embedding strings because the code is incorrectly wrapped inside of a `if (STR_SHARED_P(obj))` clause. Shared strings can't be re-embedded so this ends up being a no-op. This means that strings can be moved to a large size pool during compaction, but won't be re-embedded, which would waste the space. --- gc.c \| 16 +++++++++------- string.c \| 12 ++++++++---- test/ruby/test_gc_compact.rb \| 8 ++++---- 3 files changed, 21 insertions(+), 15 deletions(-)	2023-01-19 21:52:47 +09:00
NARUSE, Yui	686b38f83e	merge revision(s) `d8ef0a98c6`: [Backport #19319 ] [Bug #19319] Fix crash in rb_str_casemap The following code crashes on my machine: ``` GC.stress = true str = "testing testing testing" puts str.capitalize ``` We need to ensure that the object `buffer_anchor` remains on the stack so it does not get GC'd. --- string.c \| 2 ++ 1 file changed, 2 insertions(+)	2023-01-19 11:59:43 +09:00
Nobuyoshi Nakada	98fbebf110	[DOC] Fix typo	2022-12-22 00:01:18 +09:00
S-H-GAMELINKS	1a64d45c67	Introduce encoding check macro	2022-12-02 01:31:27 +09:00
Jeremy Evans	571d21fd4a	Make String#rstrip{,!} raise Encoding::CompatibilityError for broken coderange It's questionable whether we want to allow rstrip to work for strings where the broken coderange occurs before the trailing whitespace and not after, but this approach is probably simpler, and I don't think users should expect string operations like rstrip to work on broken strings. In some cases, this changes rstrip to raise Encoding::CompatibilityError instead of ArgumentError. However, as the problem is related to an encoding issue in the receiver, and due not due to an issue with an argument, I think Encoding::CompatibilityError is the more appropriate error. Fixes [Bug #18931]	2022-11-24 18:24:42 -08:00
S-H-GAMELINKS	1f4f6c9832	Using UNDEF_P macro	2022-11-16 18:58:33 +09:00
Takashi Kokubun	e7443dbbca	Rewrite Symbol#to_sym and #intern in Ruby (#6683 )	2022-11-15 21:34:30 -08:00
Peter Zhu	710c1ada84	Use string's capacity to determine if reembeddable During auto-compaction, using length to determine whether or not a string can be re-embedded may be a problem for newly created strings. This is because usually it requires a malloc before setting the length. If the malloc triggers compaction, then the string may be re-embedded and can cause crashes.	2022-11-14 16:59:43 -05:00
Peter Zhu	0468136a1b	Make str_alloc_heap return a STR_NOEMBED string This commit refactors str_alloc_heap to return a string with the STR_NOEMBED flag set.	2022-11-03 09:09:11 -04:00
Vaevictusnet	7726f6bfff	Correcting example for swapcase! method Example, line 3, swapcase! was incorrect. implied that the swapcase! did /not/ change the starting string.	2022-10-04 10:07:01 +09:00
Peter Zhu	28a572f8bf	Fix bug when slicing a string with broken encoding Commit `aa2a428` introduced a bug where non-embedded string slices copied the encoding of the original string. If the original string had a broken encoding but the slice has valid encoding, then the slice would be incorrectly marked as broken encoding.	2022-09-28 09:05:23 -04:00
Peter Zhu	6f8d17e43c	Make string slices views rather than copies Just like commit `1c16645` for arrays, this commit changes string slices to be a view rather than a copy even if it can be allocated through VWA.	2022-09-28 09:05:23 -04:00
Peter Zhu	aa2a428cfb	Refactor str_substr and str_subseq This commit extracts common code between str_substr and rb_str_subseq into a function called str_subseq. This commit also applies optimizations in commit `2e88bca` to rb_str_subseq.	2022-09-26 14:54:32 -04:00
Jean Boussier	2e88bca24f	string.c: don't create a frozen copy for str_new_shared str_new_shared already has all the necessary logic to do this and is also smart enough to skip this step if the source string is already a shared string itself. This saves a useless String allocation on each call.	2022-09-26 13:41:17 +02:00
Kazuki Yamaguchi	5b0396473b	Fix coderange calculation in String#b Leave the new coderange unknown if the original encoding is not ASCII-compatible. Non-ASCII-compatible encoding strings with valid or broken coderange can end up as ascii-only. Fixes `9a8f6e392f` ("Cheaply derive code range for String#b return value", 2022-07-25).	2022-09-26 16:44:46 +09:00
Yusuke Endoh	a78c733cc3	Revert "Revert "error.c: Let Exception#inspect inspect its message"" This reverts commit `b9f030954a`. [Bug #18170]	2022-09-23 16:40:59 +09:00
Benoit Daloze	6525b6f760	Remove get_actual_encoding() and the dynamic endian detection for dummy UTF-16/UTF-32 * And simplify callers of get_actual_encoding(). * See [Feature #18949]. * See https://github.com/ruby/ruby/pull/6322#issuecomment-1242758474	2022-09-12 14:02:34 +02:00
Kazuki Yamaguchi	aff6534e32	Avoid unnecessary copying when removing the leading part of a string Remove the superfluous str_modify_keep_cr() call from rb_str_update(). It ends up calling either rb_str_drop_bytes() or rb_str_splice_0(), which already does checks if necessary. The extra call makes the string "independent". This is not always wanted, in other words, it can keep the same shared root when merely removing the leading part of a shared string.	2022-09-09 16:03:20 +09:00
Jean Boussier	cd1724bdde	rb_str_concat_literals: use rb_str_buf_append That's about 1.30x faster.	2022-09-08 15:02:21 +02:00
Nobuyoshi Nakada	332d29df53	[DOC] non-positive `base` in `Kernel#Integer` and `String#to_i`	2022-09-08 11:52:16 +09:00
Nobuyoshi Nakada	576bdec03f	[Bug #18973 ] Promote US-ASCII to ASCII-8BIT when adding 8-bit char	2022-08-31 17:27:59 +09:00
Nobuyoshi Nakada	fe4dd18db4	[DOC] Fix a typo [ci skip]	2022-08-27 12:54:42 +09:00
Nobuyoshi Nakada	43e8d9a050	Check if encoding capable object before check if ASCII compatible	2022-08-20 10:06:40 +09:00
Jean Boussier	b0b9f7201a	rb_str_resize: Only clear coderange on truncation If we are expanding the string or only stripping extra capacity then coderange won't change, so clearing it is wasteful.	2022-08-18 10:09:08 +02:00
Jeremy Evans	49517b3bb4	Fix inspect for unicode codepoint 0x85 This is an inelegant hack, by manually checking for this specific code point in rb_str_inspect. Some testing indicates that this is the only code point affected. It's possible a better fix would be inside of lower-level encoding code, such that rb_enc_isprint would return false and not true for codepoint 0x85. Fixes [Bug #16842]	2022-08-11 08:47:29 -07:00
Nobuyoshi Nakada	2d1cf658ee	Adjust indent [ci skip]	2022-07-26 18:33:21 +09:00
Kevin Menard	9a8f6e392f	Cheaply derive code range for String#b return value The result of String#b is a string with an ASCII_8BIT/BINARY encoding. That encoding is ASCII-compatible and has no byte sequences that are invalid for the encoding. If we know the receiver's code range, we can derive the resulting string's code range without needing to perform a full code range scan.	2022-07-26 09:03:44 +02:00
Jean Boussier	31a5586d1e	rb_str_buf_append: add a fast path for ENC_CODERANGE_VALID If the RHS has valid encoding, and both strings have the same encoding, we can use the fast path. However we need to update the LHS coderange. ``` compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master `cdbb9b8555`) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21] warming up... \| \|compare-ruby\|built-ruby\| \|:-------------------\|-----------:\|---------:\| \|binary_concat_7bit \| 554.816k\| 556.460k\| \| \| -\| 1.00x\| \|utf8_concat_7bit \| 556.367k\| 555.101k\| \| \| 1.00x\| -\| \|utf8_concat_UTF8 \| 412.555k\| 556.824k\| \| \| -\| 1.35x\| ```	2022-07-25 14:18:52 +02:00
Takashi Kokubun	5b21e94beb	Expand tabs [ci skip] [Misc #18891]	2022-07-21 09:42:04 -07:00
Jeremy Evans	423b41cba7	Make String#each_line work correctly with paragraph separator and chomp Previously, it was including one newline when chomp was used, which is inconsistent with IO#each_line behavior. This makes behavior consistent with IO#each_line, chomping all paragraph separators (multiple consecutive newlines), but not single newlines. Partially Fixes [Bug #18768]	2022-07-21 08:02:32 -07:00
Jean Boussier	f954c5dae4	string.c: use str_enc_fastpath in TERM_LEN Not having to fetch the rb_encoding save a significant amount of time. Additionally, even when we have to fetch it, we can do it faster using `ENCODING_GET` rather than `rb_enc_get`. ``` compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master `cb9fd920a3`) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21] warming up... \| \|compare-ruby\|built-ruby\| \|:---------------------\|-----------:\|---------:\| \|binary_concat_utf8 \| 510.580k\| 565.600k\| \| \| -\| 1.11x\| \|binary_concat_binary \| 512.653k\| 571.483k\| \| \| -\| 1.11x\| \|utf8_concat_utf8 \| 511.396k\| 566.879k\| \| \| -\| 1.11x\| ```	2022-07-21 15:06:50 +02:00
Jean Boussier	cb9fd920a3	str_buf_cat: preserve coderange when going through fastpath rb_str_modify clear the coderange, which in this case isn't necessary. ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master `71aec68566`) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-19T07:17:01Z faster-buffer-conc.. 3cad62aab4) [arm64-darwin21] warming up... \| \|compare-ruby\|built-ruby\| \|:---------------------\|-----------:\|---------:\| \|binary_concat_utf8 \| 360.617k\| 605.091k\| \| \| -\| 1.68x\| \|binary_concat_binary \| 446.650k\| 605.053k\| \| \| -\| 1.35x\| \|utf8_concat_utf8 \| 454.166k\| 597.311k\| \| \| -\| 1.32x\| ``` ``` \| \|compare-ruby\|built-ruby\| \|:-----------\|-----------:\|---------:\| \|erb_render \| 1.790M\| 2.045M\| \| \| -\| 1.14x\| ```	2022-07-19 10:41:40 +02:00

1 2 3 4 5 ...

1799 commits