Commit graph

1636 commits

Author SHA1 Message Date
NAKAMURA Usaku
9733f4d0f0 merge revision(s) edf01d4e82: [Backport #18772]
Treat NULL fake string as an empty string

	And the NULL string must be of size 0.
	---
	 string.c | 5 +++++
	 1 file changed, 5 insertions(+)
2022-07-25 20:39:12 +09:00
nagachika
4b1cee1431 merge revision(s) e2ec97c4b8: [Backport #18415]
[DOC] How to get the longest last match [Bug #18415]

	---
	 string.c | 32 +++++++++++++++++++++++++++++++-
	 1 file changed, 31 insertions(+), 1 deletion(-)
2022-03-12 16:00:02 +09:00
nagachika
62a33dfa16 merge revision(s) c6706f15af: [Backport #18241]
Fix documentation for String#{<<,concat,prepend}

	These methods mutate and return the receiver, they don't create
	and return a new string.

	Fixes [Bug #18241]
	---
	 string.c | 19 ++++++++++---------
	 1 file changed, 10 insertions(+), 9 deletions(-)
2021-12-24 14:36:45 +09:00
nagachika
badffc7bee merge revision(s) 7f4e86804d: [Backport #18163]
Fix documentation of #<=> and #casecmp [ci skip]

	Descriptions for return values of -1 and 1 were reversed.
	---
	 string.c | 8 ++++----
	 1 file changed, 4 insertions(+), 4 deletions(-)
2021-12-24 14:35:34 +09:00
nagachika
2c947e74a0 merge revision(s) 60d0421ca861944459f52292d65dbf0ece26e38a,b6534691a16d751d59fc572d5dddebcaeb21f007,409dbc951b9875d27bd73748c88e15386473cffb,842b0008c132dd587f09766a228041afb7fed24f: [Backport #18191]
Fix the encoding of loaded feature names [Bug #18191]

	The feature names loaded from the default load paths should also
	be in the file system encoding.
	---
	 ruby.c                    | 12 +++++++++++-
	 test/ruby/test_require.rb | 22 ++++++++++++++++++++++
	 2 files changed, 33 insertions(+), 1 deletion(-)

	Copy path strings as interned strings

	---
	 ruby.c | 12 ++++++++++--
	 1 file changed, 10 insertions(+), 2 deletions(-)

	Replace expanded load path only when modified

	---
	 ruby.c | 6 +++++-
	 1 file changed, 5 insertions(+), 1 deletion(-)

	Skip broken strings as the locale encoding

	---
	 internal/string.h |  1 +
	 ruby.c            | 11 +++++++----
	 string.c          |  6 ++++++
	 3 files changed, 14 insertions(+), 4 deletions(-)
2021-10-09 15:08:38 +09:00
nagachika
650af7d29d merge revision(s) 5d81554281: [Backport #18154]
[Bug #18154] Fix memory leak in String#initialize

	String#initialize can leak memory when called on a string that is marked
	with STR_NOFREE because it does not unset the STR_NOFREE flag.
	---
	 string.c                 |  2 +-
	 test/ruby/test_string.rb | 10 ++++++++++
	 2 files changed, 11 insertions(+), 1 deletion(-)
2021-09-11 14:00:44 +09:00
nagachika
b93a2d9d2c merge revision(s) 391abc543cea118a9cd7d6310acadbfa352668ef,e86c1f6fc53433ef5c82ed2b7a4cc9a12c153e4c,f6539202c52a051a4e6946a318a1d9cd29002990: [Backport #12052]
Scan the coderange in the given encoding

	---
	 ext/-test-/string/enc_str_buf_cat.c       | 14 ++++++++++++++
	 string.c                                  | 32 ++++++++++++++++++++++---------
	 test/-ext-/string/test_enc_str_buf_cat.rb |  9 +++++++++
	 3 files changed, 46 insertions(+), 9 deletions(-)

	Work around issue transcoding issue with non-ASCII compatible
	 encodings and xml escaping

	When using a non-ASCII compatible source and destination encoding
	and xml escaping (the :xml option to String#encode), the resulting
	string was broken, as it used the correct non-ASCII compatible
	encoding, but contained data that was ASCII-compatible instead of
	compatible with the string's encoding.

	Work around this issue by detecting the case where both the
	source and destination encoding are non-ASCII compatible, and
	transcoding the source string from the non-ASCII compatible
	encoding to UTF-8. The xml escaping code will correctly handle
	the UTF-8 source string and the return the correctly encoded
	and escaped value.

	Fixes [Bug #12052]

	Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
	---
	 test/ruby/test_transcode.rb | 19 +++++++++++++++++++
	 transcode.c                 |  6 ++++++
	 2 files changed, 25 insertions(+)

	=?UTF-8?q?-=20add=20regression=20tests=20for=20U+6E7F=20(?=
	 =?UTF-8?q?=E6=B9=BF)=20in=20ISO-2022-JP?=
	MIME-Version: 1.0
	Content-Type: text/plain; charset=UTF-8
	Content-Transfer-Encoding: 8bit

	  In ISO-2022-JP, the bytes use to code are the same as those for "<>".
	  This adds regression tests to make sure that these bytes, when representing
	  湿, are NOT escaped with encode("ISO-2022-JP, xml: :text) or similar.
	  These are additional regression tests for #12052.
	---
	 test/ruby/test_transcode.rb | 3 +++
	 1 file changed, 3 insertions(+)
2021-07-18 11:19:13 +09:00
nagachika
5af5ea7f86 merge revision(s) cfd162d535: [Backport #17467]
Make String#{strip,lstrip}{,!} strip leading NUL bytes

	The documentation already specifies that they strip whitespace
	and defines whitespace to include null.

	This wraps the new behavior in the appropriate guards in the specs,
	but does not specify behavior for previous versions, because this
	is a bug that could be backported.

	Fixes [Bug #17467]
	---
	 spec/ruby/core/string/lstrip_spec.rb | 18 ++++++++++++------
	 spec/ruby/core/string/strip_spec.rb  | 22 ++++++++++------------
	 string.c                             |  4 ++--
	 test/ruby/test_string.rb             | 16 ++++++++++++++++
	 4 files changed, 40 insertions(+), 20 deletions(-)
2021-05-23 16:09:17 +09:00
NARUSE, Yui
4e2738f477 merge revision(s) 7e8a9af9db: [Backport #17732]
rb_enc_interned_str: handle autoloaded encodings

	If called with an autoloaded encoding that was not yet
	initialized, `rb_enc_interned_str` would crash with
	a NULL pointer exception.

	See: https://github.com/ruby/ruby/pull/4119#issuecomment-800189841
	---
	 encoding.c                        | 28 ++++++++++++----------------
	 ext/-test-/string/depend          |  3 +++
	 ext/-test-/string/fstring.c       | 15 +++++++++++++++
	 internal/encoding.h               |  3 +++
	 string.c                          |  4 ++++
	 test/-ext-/string/test_fstring.rb | 16 ++++++++++++++++
	 6 files changed, 53 insertions(+), 16 deletions(-)
2021-04-02 16:06:31 +09:00
zverok
4728c0d900 Add Symbol#name and freezing explanation to #to_s 2020-12-21 19:22:38 -05:00
Nobuyoshi Nakada
c7a5cc2c30
Replaced magic numbers tr table 2020-12-21 23:45:38 +09:00
Jeremy Evans
05313c914b Use category: :deprecated in warnings that are related to deprecation
Also document that both :deprecated and :experimental are supported
:category option values.

The locations where warnings were marked as deprecation warnings
was previously reviewed by shyouhei.

Comment a couple locations where deprecation warnings should probably
be used but are not currently used because deprecation warning
enablement has not occurred at the time they are called
(RUBY_FREE_MIN, RUBY_HEAP_MIN_SLOTS, -K).

Add assert_deprecated_warn to test assertions.  Use this to simplify
some tests, and fix failing tests after marking some warnings with
deprecated category.
2020-12-18 09:54:11 -08:00
Koichi Sasada
344ec26a99 tuning trial: newobj with current ec
Passing current ec can improve performance of newobj. This patch
tries it for Array and String literals ([] and '').
2020-12-07 08:28:36 +09:00
Koichi Sasada
764de7566f should not use rb_str_modify(), too
Same as 8247b8edde, should not use rb_str_modify() here.

https://bugs.ruby-lang.org/issues/17343#change-88858
2020-12-01 18:16:23 +09:00
Jean Boussier
6bef49427a Fix rb_interned_str_* functions to not assume static strings
Fixes [Feature #13381]

When passed a `fake_str`, `register_fstring` would create new strings
with `str_new_static`. That's not what was expected, and answer
almost no use cases.
2020-11-30 17:33:28 +09:00
Nobuyoshi Nakada
02c32b2e92
Get rid of allocation when the capacity is small 2020-11-29 15:01:41 +09:00
Takashi Kokubun
3f8c60cf09
Remove obsoleted str_new_empty
since 58325daae3.

../string.c:1339:1: warning: ‘str_new_empty’ defined but not used [-Wunused-function]
 1339 | str_new_empty(VALUE str)
      | ^~~~~~~~~~~~~
2020-11-20 22:22:29 -08:00
Jeremy Evans
58325daae3 Make String methods return String instances when called on a subclass instance
This modifies the following String methods to return String instances
instead of subclass instances:

* String#*
* String#capitalize
* String#center
* String#chomp
* String#chop
* String#delete
* String#delete_prefix
* String#delete_suffix
* String#downcase
* String#dump
* String#each/#each_line
* String#gsub
* String#ljust
* String#lstrip
* String#partition
* String#reverse
* String#rjust
* String#rpartition
* String#rstrip
* String#scrub
* String#slice!
* String#slice/#[]
* String#split
* String#squeeze
* String#strip
* String#sub
* String#succ/#next
* String#swapcase
* String#tr
* String#tr_s
* String#upcase

This also fixes a bug in String#swapcase where it would return the
receiver instead of a copy of the receiver if the receiver was the
empty string.

Some string methods were left to return subclass instances:

* String#+@
* String#-@

Both of these methods will return the receiver (subclass instance)
in some cases, so it is best to keep the returned class consistent.

Fixes [#10845]
2020-11-20 16:30:23 -08:00
Jean Boussier
ef19fb111a Expose the rb_interned_str_* family of functions
Fixes [Feature #13381]
2020-11-17 09:39:25 +09:00
Alan Wu
520b86caf1 Move variable closer to usage 2020-10-30 19:34:41 -04:00
Stefan Stüben
8c2e5bbf58 Don't redefine #rb_intern over and over again 2020-10-21 12:45:18 +09:00
Burdette Lamar
33776598f7
Enhanced RDoc for String#insert (#3643)
* Enhanced RDoc for String#insert
2020-10-08 15:35:13 -05:00
Burdette Lamar
4bc6190a34
Enhanced RDoc for String#[] (#3607)
* Enhanced RDoc for String#[]
2020-09-30 14:58:12 -05:00
Burdette Lamar
48b94b7919
Enhanced RDoc for String#upto (#3603)
* Enhanced RDoc for String#upto
2020-09-29 19:15:39 -05:00
Burdette Lamar
0555bd8435
Enhanced RDoc for String#succ! (#3596)
* Enhanced RDoc for String#succ!
2020-09-28 11:58:39 -05:00
Burdette Lamar
8b42474a26
Enhanced RDoc for String#succ (#3590)
* Enhanced RDoc for String#succ
2020-09-25 15:13:10 -05:00
Burdette Lamar
83ff0f74bf
Enhanced RDoc for String#match? (#3576)
* Enhanced RDoc for String#match?
2020-09-24 18:38:11 -05:00
Burdette Lamar
38385d28df
Enhanced RDoc for String (#3574)
Methods:

    =~
    match
2020-09-24 13:23:26 -05:00
Burdette Lamar
6fe2a9fcda
Enhanced RDoc for String (#3569)
Makes some methods doc compliant with https://github.com/ruby/ruby/blob/master/doc/method_documentation.rdoc. Also, other minor revisions to make more consistent.
Methods:

    ==
    ===
    eql?
    <=>
    casecmp
    casecmp?
    index
    rindex
2020-09-24 10:55:43 -05:00
Kazuhiro NISHIYAMA
9a8f5f0a9a
Fix call-seq [ci skip]
`encoding` can be not only an encoding name, but also an Encoding object.

```
s = String.new('foo', encoding: Encoding::US_ASCII)
s.encoding # => #<Encoding:US-ASCII>
```
2020-09-23 11:44:06 +09:00
Burdette Lamar
b904b72960
Enhanced RDoc for String (#3565)
Makes some methods doc compliant with https://github.com/ruby/ruby/blob/master/doc/method_documentation.rdoc. Also, other minor revisions to make more consistent.
Methods:

    try_convert
    +string
    -string
    concat
    <<
    prepend
    hash
2020-09-22 16:32:17 -05:00
Burdette Lamar
c6c5d4b3fa
Comply with guide for method doc: string.c (#3528)
Methods:

    ::new
    #length
    #bytesize
    #empty?
    #+
    #*
    #%
2020-09-21 11:27:54 -05:00
Koichi Sasada
dd5db6f5fe sync fstring_table for deletion
Ractors can access this table simultaneously so we need to sync
accesses.
2020-09-18 14:17:49 +09:00
Koichi Sasada
e81d7189a0 sync fstring pool
fstring pool should be sync with other Ractors.
2020-09-15 00:04:59 +09:00
Soutaro Matsumoto
f0ddbd502c
Let String#slice! return nil (#3533)
Returns `nil` instead of an empty string when non-integer number is given (to make it 2.7 compatible).
2020-09-11 14:34:10 +09:00
Nobuyoshi Nakada
eb67c603ca
Added Symbol#name
https://bugs.ruby-lang.org/issues/16150#change-87446
2020-09-04 22:18:59 +09:00
Burdette Lamar
51525557fd
Partial compliance with doc/method_documentation.rdoc in string.c (#3436)
Removes references to *-convertible thingies.
2020-08-20 12:09:49 -05:00
Jean Boussier
aaf0e33c0a register_fstring: avoid duping the passed string when possible
If the passed string is frozen, bare and not shared, then there
is no need to duplicate it.

Ref: 4ab69ebbd7
Ref: https://bugs.ruby-lang.org/issues/11386
2020-08-19 08:08:56 -07:00
Nobuyoshi Nakada
d75433ae19
[DOC] fixed a missing markup 2020-08-15 14:17:02 +09:00
Kasumi Hanazuki
014a4fda54 rb_str_{index,rindex}_m: Handle /\K/ in pattern
When the pattern Regexp given to String#index and String#rindex
contain a /\K/ (lookbehind) operator, these methods return the
position where the beginning of the lookbehind pattern matches, while
they are expected to return the position where the \K matches.

```
# without patch
"abcdbce".index(/b\Kc/)  # => 1
"abcdbce".rindex(/b\Kc/)  # => 4
```

This patch fixes this problem by using BEG(0) instead of the return
value of rb_reg_search.

```
# with patch
"abcdbce".index(/b\Kc/)  # => 2
"abcdbce".rindex(/b\Kc/)  # => 5
```

Fixes [Bug #17118]
2020-08-13 20:54:12 +09:00
Kasumi Hanazuki
5d71eed1a7 rb_str_{partition,rpartition}_m: Handle /\K/ in pattern
When the pattern given to String#partition and String#rpartition
contain a /\K/ (lookbehind) operator, the methods return strings
sliced at incorrect positions.

```
# without patch
"abcdbce".partition(/b\Kc/)  # => ["a", "c", "cdbce"]
"abcdbce".rpartition(/b\Kc/)  # => ["abcd", "c", "ce"]
```

This patch fixes the problem by using BEG(0) instead of the return
value of rb_reg_search.

```
# with patch
"abcdbce".partition(/b\Kc/)  # => ["ab", "c", "dbce"]
"abcdbce".rpartition(/b\Kc/)  # => ["abcdb", "c", "e"]
```

As a side-effect this patch makes String#partition 2x faster when the
pattern is a costly Regexp by performing Regexp search only once,
which was unexpectedly done twice in the original implementation.

Fixes [Bug #17119]
2020-08-13 20:50:50 +09:00
Kasumi Hanazuki
e79cdcf61b string.c(rb_str_split_m): Handle /\K/ correctly
Use BEG(0) instead of the result of rb_reg_search to handle the cases
when the separator Regexp contains /\K/ (lookbehind) operator.

Fixes [Bug #17113]
2020-08-12 10:01:39 +09:00
Nobuyoshi Nakada
0ca6b973e8
Removed non-ASCII code to suppress warnings by localized compilers 2020-08-10 19:46:13 +09:00
Nobuyoshi Nakada
fac62f094e
Adjust indent 2020-08-10 16:35:42 +09:00
Kazuhiro NISHIYAMA
946cd6c534
Use https instead of http 2020-07-28 19:51:54 +09:00
卜部昌平
de3e931df7 add UNREACHABLE_RETURN
Not every compilers understand that rb_raise does not return.  When a
function does not end with a return statement, such compilers can issue
warnings.  We would better tell them about reachabilities.
2020-06-29 11:05:41 +09:00
卜部昌平
5f926b2b00 rb_str_partition: do not goto into a branch
I'm not necessarily against every goto in general, but jumping into a
branch is definitely a bad idea.  Better refactor.
2020-06-29 11:05:41 +09:00
卜部昌平
e3d821a36c rb_str_crypt: do not goto into a branch
I'm not necessarily against every goto in general, but jumping into a
branch is definitely a bad idea.  Better refactor.
2020-06-29 11:05:41 +09:00
卜部昌平
a5ae9aebbc trnext: do not goto into a branch
I'm not necessarily against every goto in general, but jumping into a
branch is definitely a bad idea.  Better refactor.
2020-06-29 11:05:41 +09:00
卜部昌平
c7a4073154 chompped_length: do not goto into a branch
I'm not necessarily against every goto in general, but jumping into a
branch is definitely a bad idea.  Better refactor.
2020-06-29 11:05:41 +09:00