Commit graph

1917 commits

Author SHA1 Message Date
Kazuki Yamaguchi
5b0396473b Fix coderange calculation in String#b
Leave the new coderange unknown if the original encoding is not
ASCII-compatible. Non-ASCII-compatible encoding strings with valid or
broken coderange can end up as ascii-only.

Fixes 9a8f6e392f ("Cheaply derive code range for String#b return
value", 2022-07-25).
2022-09-26 16:44:46 +09:00
Yusuke Endoh
a78c733cc3 Revert "Revert "error.c: Let Exception#inspect inspect its message""
This reverts commit b9f030954a.

[Bug #18170]
2022-09-23 16:40:59 +09:00
Benoit Daloze
6525b6f760 Remove get_actual_encoding() and the dynamic endian detection for dummy UTF-16/UTF-32
* And simplify callers of get_actual_encoding().
* See [Feature #18949].
* See https://github.com/ruby/ruby/pull/6322#issuecomment-1242758474
2022-09-12 14:02:34 +02:00
Kazuki Yamaguchi
aff6534e32 Avoid unnecessary copying when removing the leading part of a string
Remove the superfluous str_modify_keep_cr() call from rb_str_update().
It ends up calling either rb_str_drop_bytes() or rb_str_splice_0(),
which already does checks if necessary.

The extra call makes the string "independent". This is not always
wanted, in other words, it can keep the same shared root when merely
removing the leading part of a shared string.
2022-09-09 16:03:20 +09:00
Jean Boussier
cd1724bdde rb_str_concat_literals: use rb_str_buf_append
That's about 1.30x faster.
2022-09-08 15:02:21 +02:00
Nobuyoshi Nakada
332d29df53
[DOC] non-positive base in Kernel#Integer and String#to_i 2022-09-08 11:52:16 +09:00
Nobuyoshi Nakada
576bdec03f [Bug #18973] Promote US-ASCII to ASCII-8BIT when adding 8-bit char 2022-08-31 17:27:59 +09:00
Nobuyoshi Nakada
fe4dd18db4
[DOC] Fix a typo [ci skip] 2022-08-27 12:54:42 +09:00
Nobuyoshi Nakada
43e8d9a050 Check if encoding capable object before check if ASCII compatible 2022-08-20 10:06:40 +09:00
Jean Boussier
b0b9f7201a rb_str_resize: Only clear coderange on truncation
If we are expanding the string or only stripping extra capacity
then coderange won't change, so clearing it is wasteful.
2022-08-18 10:09:08 +02:00
Jeremy Evans
49517b3bb4 Fix inspect for unicode codepoint 0x85
This is an inelegant hack, by manually checking for this specific
code point in rb_str_inspect.  Some testing indicates that this is
the only code point affected.

It's possible a better fix would be inside of lower-level encoding
code, such that rb_enc_isprint would return false and not true for
codepoint 0x85.

Fixes [Bug #16842]
2022-08-11 08:47:29 -07:00
Nobuyoshi Nakada
2d1cf658ee
Adjust indent [ci skip] 2022-07-26 18:33:21 +09:00
Kevin Menard
9a8f6e392f Cheaply derive code range for String#b return value
The result of String#b is a string with an ASCII_8BIT/BINARY encoding. That encoding is ASCII-compatible and has no byte sequences that are invalid for the encoding. If we know the receiver's code range, we can derive the resulting string's code range without needing to perform a full code range scan.
2022-07-26 09:03:44 +02:00
Jean Boussier
31a5586d1e rb_str_buf_append: add a fast path for ENC_CODERANGE_VALID
If the RHS has valid encoding, and both strings have the same
encoding, we can use the fast path.

However we need to update the LHS coderange.

```
compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master cdbb9b8555) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21]
warming up...

|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|binary_concat_7bit  |    554.816k|  556.460k|
|                    |           -|     1.00x|
|utf8_concat_7bit    |    556.367k|  555.101k|
|                    |       1.00x|         -|
|utf8_concat_UTF8    |    412.555k|  556.824k|
|                    |           -|     1.35x|
```
2022-07-25 14:18:52 +02:00
Takashi Kokubun
5b21e94beb Expand tabs [ci skip]
[Misc #18891]
2022-07-21 09:42:04 -07:00
Jeremy Evans
423b41cba7 Make String#each_line work correctly with paragraph separator and chomp
Previously, it was including one newline when chomp was used,
which is inconsistent with IO#each_line behavior. This makes
behavior consistent with IO#each_line, chomping all paragraph
separators (multiple consecutive newlines), but not single
newlines.

Partially Fixes [Bug #18768]
2022-07-21 08:02:32 -07:00
Jean Boussier
f954c5dae4 string.c: use str_enc_fastpath in TERM_LEN
Not having to fetch the rb_encoding save a significant
amount of time.

Additionally, even when we have to fetch it, we can do
it faster using `ENCODING_GET` rather than `rb_enc_get`.

```
compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master cb9fd920a3) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_concat_utf8    |    510.580k|  565.600k|
|                      |           -|     1.11x|
|binary_concat_binary  |    512.653k|  571.483k|
|                      |           -|     1.11x|
|utf8_concat_utf8      |    511.396k|  566.879k|
|                      |           -|     1.11x|
```
2022-07-21 15:06:50 +02:00
Jean Boussier
cb9fd920a3 str_buf_cat: preserve coderange when going through fastpath
rb_str_modify clear the coderange, which in this case isn't
necessary.

```
compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-19T07:17:01Z faster-buffer-conc.. 3cad62aab4) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_concat_utf8    |    360.617k|  605.091k|
|                      |           -|     1.68x|
|binary_concat_binary  |    446.650k|  605.053k|
|                      |           -|     1.35x|
|utf8_concat_utf8      |    454.166k|  597.311k|
|                      |           -|     1.32x|
```

```
|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|erb_render  |      1.790M|    2.045M|
|            |           -|     1.14x|
```
2022-07-19 10:41:40 +02:00
Jean Boussier
0ae8dbbee0 rb_str_buf_append: fastpath to str_buf_cat
If the LHS is ASCII compatible and the RHS is 7BIT
we can directly concat without being concerned about
anything else.

Benchmark:
```
compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21]
warming up...

|                      |compare-ruby|built-ruby|
|:---------------------|-----------:|---------:|
|binary_append_utf8    |    385.315k|  573.663k|
|                      |           -|     1.49x|
|binary_append_binary  |    446.579k|  574.898k|
|                      |           -|     1.29x|
|utf8_append_utf8      |    430.936k|  573.394k|
|                      |           -|     1.33x|
```

Note that in the benchmark, the RHS always have a precomputed
coderange. So the benchmark never enter the slowpath of having to
scan the RHS. However it's extremly likely that we'll end
up scanning it anyway in rb_enc_cr_str_buf_cat
2022-07-19 10:41:40 +02:00
Jean Boussier
d084585f01 Rename ENCINDEX_ASCII to ENCINDEX_ASCII_8BIT
Otherwise it's way too easy to confuse it with US_ASCII.
2022-07-19 08:48:56 +02:00
Burdette Lamar
081bd061a8
[DOC] Correct call-seq directive in string.c (#6131)
Correct call-seq directive in string.c
2022-07-13 10:44:22 -05:00
S-H-GAMELINKS
420f3ced4d Using is_ascii_string to check encoding 2022-06-17 12:02:50 +09:00
Alan Wu
714a4942fd
Remove unused and accidentally public rb_str_shared_root_p()
This function was added to a public header in [1] probably
unintentionally since it's not used anywhere, exposes implementation
details, and isn't related to the goals of that pull request.

[1]: 56cc3e99b6
2022-06-16 07:20:20 -04:00
Nobuyoshi Nakada
048f14221c
Add placeholder to let braces match 2022-06-14 10:21:55 +09:00
Matt Valentine-House
56cc3e99b6 Move String RVALUES between pools
And re-embed any strings that can now fit inside the slot they've been
moved to
2022-06-13 10:11:27 -07:00
Alexander Ilyin
adcfd69690
[DOC] Fix markup for String (#5984)
* Add missing space for `String#start_with?`.
* Add missing pluses for `String#tr` and
  `Methods for Converting to New String` label.
* Move quote into the tag for `Whitespace in Strings` label.
2022-06-09 13:40:21 -05:00
Yusuke Endoh
b9f030954a Revert "error.c: Let Exception#inspect inspect its message"
This reverts commit 9d927204e7.
2022-06-07 11:52:44 +09:00
Yusuke Endoh
9d927204e7 error.c: Let Exception#inspect inspect its message
... only when the message string has a newline.

`p StandardError.new("foo\nbar")` now prints `#<StandardError: "foo\nbar">'
instead of:

    #<StandardError:
    bar>

[Bug #18170]
2022-06-07 11:07:09 +09:00
Jean Boussier
65122d09d5 [Feature #18595] Alias String#-@ as String#dedup 2022-05-20 11:31:59 -07:00
Nobuyoshi Nakada
5d45afdbbf
[DOC] Move the documentations of moved Symbol methods 2022-04-14 11:17:37 +09:00
Burdette Lamar
dfdc03248f
[DOC] Enhanced RDoc for Symbol (#5796)
Treats:
    #[]
    #length
    #empty?
    #upcase
    #downcase
    #capitalize
    #swapcase
    #start_with?
    #end_with?
    #encoding
    ::all_symbols
2022-04-13 13:45:18 -05:00
Nobuyoshi Nakada
7e97ebb6eb
Enforce literals on the second arguments 2022-04-13 18:33:34 +09:00
Burdette Lamar
b21026cb1a
Enhanced RDoc for Symbol (#5795)
Treats:

    #==
    #inspect
    #name
    #to_s
    #to_sym
    #to_proc
    #succ
    #<=>
    #casecmp
    #casecmp?
    #=~
    #match
    #match?
2022-04-12 17:27:18 -05:00
Burdette Lamar
70415071e8
Fix some RDoc links (#5778) 2022-04-08 14:25:38 -05:00
Burdette Lamar
9ca3d537b9
All-in-one RDoc for class String (#5777) 2022-04-07 14:29:04 -05:00
Burdette Lamar
717b20ee30
[DOC] Enhanced RDoc for string slices (#5769)
Creates file doc/string/slices.rdoc that the string slicing methods can link to.
2022-04-06 15:47:22 -05:00
Burdette Lamar
4a4485adbd
Enhanced RDoc for String#index (#5759) 2022-04-04 14:18:10 -05:00
Burdette Lamar
0b0ae583f4
[DOC] Enhanced RDoc for String (#5753)
Treats:
    #length
    #bytesize
2022-04-03 10:09:34 -05:00
Burdette Lamar
7be4d900f0
[DOC] Enhanced RDoc for String (#5751)
Adds to doc for String.new, also making it compliant with documentation_guide.rdoc.
    Fixes some broken links in io.c (that I failed to correct yesterday).
2022-04-02 14:26:49 -05:00
Burdette Lamar
056b7a8633
[DOC] Enhanced RDoc for String (#5742)
Treats:
    #force_encoding
    #b
    #valid_encoding?
    #ascii_only?
    #scrub
    #scrub!
    #unicode_normalized?
Plus a couple of minor tweaks.
2022-03-31 15:09:25 -05:00
Burdette Lamar
ffcdbedbfb
Repaired What's Here sections for Range, String, Symbol, Struct (#5735)
Repaired What's Here sections for Range, String, Symbol, Struct.
2022-03-30 13:46:24 -05:00
Burdette Lamar
b257034ae5
[DOC] Enhanced RDoc for String (#5730)
Treats:

    #start_with?
    #end_with?
    #delete_prefix
    #delete_prefix!
    #delete_suffix
    #delete_suffix!
2022-03-29 09:54:29 -05:00
Burdette Lamar
5525e47a0b
[DOC] Enhanced RDoc for String (#5726)
Treats:

    #ljust
    #rjust
    #center
    #partition
    #rpartition
2022-03-28 15:49:18 -05:00
Burdette Lamar
d52cf1013f
[DOC] Enhanced RDoc for String (#5724)
Treats:

    #scan
    #hex
    #oct
    #crypt
    #ord
    #sum
2022-03-27 14:45:14 -05:00
Nobuyoshi Nakada
1b0f05168d
[DOC] Fix references to unary operator 2022-03-27 11:24:06 +09:00
Burdette Lamar
e699e2d9bf
Enhanced RDoc for String (#5723)
Treats:

    #lstrip
    #lstrip!
    #rstrip
    #rstrip!
    #strip
    #strip!

Adds section Whitespace in Strings.
2022-03-26 12:42:44 -05:00
Nobuyoshi Nakada
300f4677c9
[DOC] Use simple references to operator methods
Method references is not only able to be marked up as code, also
reflects `--show-hash` option.
The bug that prevented the old rdoc from correctly parsing these
methods was fixed last month.
2022-03-26 21:13:16 +09:00
Burdette Lamar
465edb96f0
[DOC] Enhanced RDoc for String (#5707)
Treated:

    #chomp
    #chomp!
    #chop
    #chop!
2022-03-24 19:40:58 -05:00
Burdette Lamar
0140e6c41e
[DOC] Enhanced RDoc for String (#5685)
Treats:

    #chars
    #codepoints
    #each_char
    #each_codepoint
    #each_grapheme_cluster
    #grapheme_clusters

Also, corrects a passage in #unicode_normalize that mentioned module UnicodeNormalize, whose doc (:nodoc:, actually) says not to mention it.
2022-03-22 14:51:05 -05:00
Burdette Lamar
c129b6119d
[DOC] Use RDoc inclusions in string.c (#5683)
As @peterzhu2118 and @duerst have pointed out, putting string method's RDoc into doc/ (which allows non-ASCII in examples) makes the "click to toggle source" feature not work for that method.

This PR moves the primary method doc back into string.c, then includes RDoc from doc/string/*.rdoc, and also removes doc/string.rdoc.

The affected methods are:

    ::new
    #bytes
    #each_byte
    #each_line
    #split

The call-seq is in string.c because it works there; it did not work when the call-seq is in doc/string/*.rdoc.

This PR also updates the relevant guidance in doc/documentation_guide.rdoc.
2022-03-21 14:58:00 -05:00