Commit graph

413 commits

Author SHA1 Message Date
Nobuyoshi Nakada
6179cc0118
[DOC] Fill undocumented documents 2025-08-04 02:23:43 +09:00
BurdetteLamar
54a578e72a [DOC] Tweaks for String#encoding 2025-07-23 17:09:33 -04:00
Jean Boussier
8541dec8c4 encoding.c: check for autoload before checking index
Otherwise we may be checking the index while the encoding
is being autoloaded by another ractor.
2025-07-22 12:58:49 +02:00
Jean Boussier
1fb4929ace Make rb_enc_autoload_p atomic
Using `encoding->max_enc_len` as a way to check if the encoding
has been loaded isn't atomic, because it's not atomically set
last.

Intead we can use a dedicated atomic value inside the encoding table.
2025-07-10 17:18:20 +02:00
Jean Boussier
482f4cad82 Autoload encodings on the main ractor
None of the datastructures involved in the require process are
safe to call on a secondary ractor, however when autoloading
encodings, we do so from the current ractor.

So all sorts of corruption can happen when using an autoloaded
encoding for the first time from a secondary ractor.
2025-07-07 12:44:21 +02:00
John Hawthorn
24ac9f11de Revert "Add locks around accesses/modifications to global encodings table"
This reverts commit cf4d37fbc0.
2025-07-03 21:39:10 -07:00
John Hawthorn
50704fe8e6 Revert "Make get/set default internal/external encoding lock-free"
This reverts commit dda5a04f2b.
2025-07-03 21:39:10 -07:00
Luke Gruber
dda5a04f2b Make get/set default internal/external encoding lock-free
Also, make sure autoloading of encodings is safe across ractors.
2025-07-03 13:33:10 -07:00
Luke Gruber
cf4d37fbc0 Add locks around accesses/modifications to global encodings table
This fixes segfaults and errors of the type "Encoding not found" when
using encoding-related methods and internal encoding c functions across
ractors.

Example of a possible segfault in release mode or assertion error in debug mode:

```ruby
rs = []
100.times do
  rs << Ractor.new do
    "abc".force_encoding(Encoding.list.shuffle.first)
  end
end
while rs.any?
  r, obj = Ractor.select(*rs)
  rs.delete(r)
end
```
2025-07-03 13:33:10 -07:00
Nobuyoshi Nakada
fc518fe1ff
Delimit the scopes using encoding/symbol tables 2025-05-25 15:22:43 +09:00
Nobuyoshi Nakada
b4417ff665 Add Encoding::UNICODE_VERSION constant 2025-04-23 14:14:36 +09:00
Jean Boussier
fae86a701e string.c: Directly create strings with the correct encoding
While profiling msgpack-ruby I noticed a very substantial amout of time
spent in `rb_enc_associate_index`, called by `rb_utf8_str_new`.

On that benchmark, `rb_utf8_str_new` is 33% of the total runtime,
in big part because it cause GC to trigger often, but even then
`5.3%` of the total runtime is spent in `rb_enc_associate_index`
called by `rb_utf8_str_new`.

After closer inspection, it appears that it's performing a lot of
safety check we can assert we don't need, and other extra useless
operations, because strings are first created and filled as ASCII-8BIT
and then later reassociated to the desired encoding.

By directly allocating the string with the right encoding, it allow
to skip a lot of duplicated and useless operations.

After this change, the time spent in `rb_utf8_str_new` is down
to `28.4%` of total runtime, and most of that is GC.
2024-11-13 13:32:32 +01:00
Nobuyoshi Nakada
4b065bbe2b
Move common code to enc_compatible_latter 2024-10-05 16:06:54 +09:00
Peter Zhu
176c4bb3c7 Fix corruption of internal encoding string
[Bug #20598]

Just like [Bug #20595], Encoding#name_list and Encoding#aliases can have
their strings corrupted when Encoding.default_internal is set to nil.

Co-authored-by: Matthew Valentine-House <matt@eightbitraptor.com>
2024-06-27 14:06:40 -04:00
Peter Zhu
c6a0d03649 Fix corruption of encoding name string
[Bug #20595]

enc_set_default_encoding will free the C string if the encoding is nil,
but the C string can be used by the encoding name string. This will cause
the encoding name string to be corrupted.

Consider the following code:

    Encoding.default_internal = Encoding::ASCII_8BIT
    names = Encoding.default_internal.names
    p names
    Encoding.default_internal = nil
    p names

It outputs:

    ["ASCII-8BIT", "BINARY", "internal"]
    ["ASCII-8BIT", "BINARY", "\x00\x00\x00\x00\x00\x00\x00\x00"]

Co-authored-by: Matthew Valentine-House <matt@eightbitraptor.com>
2024-06-27 09:47:22 -04:00
Jean Boussier
3a7846b1aa Add a hint of ASCII-8BIT being BINARY
[Feature #18576]

Since outright renaming `ASCII-8BIT` is deemed to backward incompatible,
the next best thing would be to only change its `#inspect`, particularly
in exception messages.
2024-04-18 10:17:26 +02:00
Jean Boussier
d4f3dcf4df Refactor VM root modules
This `st_table` is used to both mark and pin classes
defined from the C API. But `vm->mark_object_ary` already
does both much more efficiently.

Currently a Ruby process starts with 252 rooted classes,
which uses `7224B` in an `st_table` or `2016B` in an `RArray`.

So a baseline of 5kB saved, but since `mark_object_ary` is
preallocated with `1024` slots but only use `405` of them,
it's a net `7kB` save.

`vm->mark_object_ary` is also being refactored.

Prior to this changes, `mark_object_ary` was a regular `RArray`, but
since this allows for references to be moved, it was marked a second
time from `rb_vm_mark()` to pin these objects.

This has the detrimental effect of marking these references on every
minors even though it's a mostly append only list.

But using a custom TypedData we can save from having to mark
all the references on minor GC runs.

Addtionally, immediate values are now ignored and not appended
to `vm->mark_object_ary` as it's just wasted space.
2024-03-06 15:33:43 -05:00
Peter Zhu
c7ce2f537f Fix memory leak in setting encodings
There is a memory leak in Encoding.default_external= and
Encoding.default_internal= because the duplicated name is not freed
when overwriting.

    10.times do
      1_000_000.times do
        Encoding.default_internal = nil
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

     25664
     41504
     57360
     73232
     89168
    105056
    120944
    136816
    152720
    168576

After:

    9648
    9648
    9648
    9680
    9680
    9680
    9680
    9680
    9680
    9680
2024-01-03 13:31:43 -05:00
Adam Hess
6816e8efcf Free everything at shutdown
when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown.

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Peter Zhu <peter@peterzhu.ca>
2023-12-07 15:52:35 -05:00
Jean Boussier
60c924770d Mark Encoding as Write Barrier protected
It doesn't even have a mark function.
It's only about a hundred objects, but not reason
to scan them every time.
2023-02-07 11:48:57 +01:00
Benoit Daloze
6abe20e87b Remove Encoding#replicate 2023-01-11 13:41:41 +01:00
Koichi Sasada
dbf77d420d surpress warning
now `enc_table->list` is not a pointer.
2022-12-16 11:12:37 +09:00
Koichi Sasada
ae19ac5b5b fixed encoding table
This reduces global lock acquiring for reading.
https://bugs.ruby-lang.org/issues/18949
2022-12-16 10:04:37 +09:00
Benoit Daloze
6525b6f760 Remove get_actual_encoding() and the dynamic endian detection for dummy UTF-16/UTF-32
* And simplify callers of get_actual_encoding().
* See [Feature #18949].
* See https://github.com/ruby/ruby/pull/6322#issuecomment-1242758474
2022-09-12 14:02:34 +02:00
Benoit Daloze
14bcf69c9c Deprecate Encoding#replicate
* See [Feature #18949].
2022-09-10 19:02:15 +02:00
Takashi Kokubun
5b21e94beb Expand tabs [ci skip]
[Misc #18891]
2022-07-21 09:42:04 -07:00
Jean Boussier
d084585f01 Rename ENCINDEX_ASCII to ENCINDEX_ASCII_8BIT
Otherwise it's way too easy to confuse it with US_ASCII.
2022-07-19 08:48:56 +02:00
Burdette Lamar
81741690a0
[DOC] Main doc for encodings moved from encoding.c to doc/encodings.rdoc (#5748)
Main doc for encodings moved from encoding.c to doc/encodings.rdoc
2022-04-01 20:41:04 -05:00
Nobuyoshi Nakada
7459a32af3 suppress warnings for probable NULL dererefences 2021-10-24 19:24:50 +09:00
卜部昌平
5112a54846 include/ruby/encoding.h: convert macros into inline functions
Less macros == huge win.
2021-10-05 14:18:23 +09:00
Jeremy Evans
3f7da458a7 Make encoding loading not issue warning
Instead of relying on setting an unsetting ruby_verbose, which is
not thread-safe, restructure require_internal and load_lock to
accept a warn argument for whether to warn, and add
rb_require_internal_silent to require without warnings.  Use
rb_require_internal_silent when loading encoding.

Note this does not modify ruby_debug and errinfo handling, those
remain thread-unsafe.

Also silent requires when loading transcoders.
2021-10-02 05:51:29 -09:00
S-H-GAMELINKS
18031f4102 Add rb_encoding_check function 2021-08-22 10:39:14 +09:00
S.H
378e8cdad6
Using RBOOL macro 2021-08-02 12:06:44 +09:00
Jean Boussier
7e8a9af9db rb_enc_interned_str: handle autoloaded encodings
If called with an autoloaded encoding that was not yet
initialized, `rb_enc_interned_str` would crash with
a NULL pointer exception.

See: https://github.com/ruby/ruby/pull/4119#issuecomment-800189841
2021-03-22 21:37:48 +09:00
Koichi Sasada
2a3324fcd2 No sync on ASCII/US_ASCCII/UTF-8
rb_enc_from_index(index) doesn't need locking if index specify
ASCII/US_ASCCII/UTF-8.
rb_enc_from_index() is called frequently so it has impact.

                               user     system      total        real
r_parallel/miniruby       174  0.000209   0.000000   5.559872 (  1.811501)
r_parallel/master_mini    175  0.000238   0.000000  12.664707 (  3.523641)

(repeat x1000 `s.split(/,/)` where s = '0,,' * 1000)
2020-12-17 03:44:23 +09:00
Lars Kanis
94b6933d1c
Set default for Encoding.default_external to UTF-8 on Windows (#2877)
* Use UTF-8 as default for Encoding.default_external on Windows
* Document UTF-8 change on Windows to Encoding.default_external

fix https://bugs.ruby-lang.org/issues/16604
2020-12-08 01:48:37 +09:00
Koichi Sasada
5e3259ea74 fix public interface
To make some kind of Ractor related extensions, some functions
should be exposed.

* include/ruby/thread_native.h
  * rb_native_mutex_*
  * rb_native_cond_*
* include/ruby/ractor.h
  * RB_OBJ_SHAREABLE_P(obj)
  * rb_ractor_shareable_p(obj)
  * rb_ractor_std*()
  * rb_cRactor

and rm ractor_pub.h
and rename srcdir/ractor.h to srcdir/ractor_core.h
    (to avoid conflict with include/ruby/ractor.h)
2020-11-18 03:52:41 +09:00
Stefan Stüben
8c2e5bbf58 Don't redefine #rb_intern over and over again 2020-10-21 12:45:18 +09:00
Koichi Sasada
a76a30724d Revert "reduce lock for encoding"
This reverts commit de17e2dea1.

This patch can introduce race condition because of conflicting
read/write access for enc_table::default_list. Maybe we need to
freeze default_list at the end of Init_encdb() in enc/encdb.c.
2020-10-20 01:34:17 +09:00
Koichi Sasada
de17e2dea1 reduce lock for encoding
To reduce the number of locking for encoding manipulation,
enc_table::list is splited to ::default_list and ::additional_list.
::default_list is pre-allocated and no need locking to access to
the ::default_list. If additional encoding space is needed, use
::additional_list and this list need to use locking.
However, most of case, ::default_list is enough.
2020-10-19 14:06:40 +09:00
Nobuyoshi Nakada
7ffd14a18c
Check encoding name to replicate
https://hackerone.com/reports/954433
2020-10-15 16:48:25 +09:00
Koichi Sasada
102c2ba65f freeze Encoding objects
Encoding objects can be accessed in multi-ractors and there is no
state to mutate. So we can mark it as frozen and shareable.
[Bug #17188]
2020-10-14 14:02:06 +09:00
Koichi Sasada
11c2f0f36c sync enc_table and rb_encoding_list
enc_table which manages Encoding information. rb_encoding_list
also manages Encoding objects. Both are accessed/modified by ractors
simultaneously so that they should be synchronized.

For enc_table, this patch introduced GLOBAL_ENC_TABLE_ENTER/LEAVE/EVAL
to access this table with VM lock. To make shortcut, three new global
variables global_enc_ascii, global_enc_utf_8, global_enc_us_ascii are
also introduced.

For rb_encoding_list, we split it to rb_default_encoding_list (256 entries)
and rb_additional_encoding_list. rb_default_encoding_list is fixed sized Array
so we don't need to synchronized (and most of apps only needs it). To manage
257 or more encoding objects, they are stored into rb_additional_encoding_list.
To access rb_additional_encoding_list., VM lock is needed.
2020-10-14 14:02:06 +09:00
Nobuyoshi Nakada
9e67a38fde
Fallback to built-in UTF-8 for miniruby
Source code encoding is defaulted to UTF-8 now too.
2020-05-16 17:36:30 +09:00
卜部昌平
9e41a75255 sed -i 's|ruby/impl|ruby/internal|'
To fix build failures.
2020-05-11 09:24:08 +09:00
卜部昌平
d7f4d732c1 sed -i s|ruby/3|ruby/impl|g
This shall fix compile errors.
2020-05-11 09:24:08 +09:00
Nobuyoshi Nakada
5d430c1b34
Added more NORETURN declarations 2020-05-11 00:40:14 +09:00
卜部昌平
9e6e39c351
Merge pull request #2991 from shyouhei/ruby.h
Split ruby.h
2020-04-08 13:28:13 +09:00
Nobuyoshi Nakada
fce667ed08
Get rid of warnings/exceptions at cleanup
After the encoding index instance variable is removed when all
instance variables are removed in `obj_free`, then `rb_str_free`
causes uninitialized instance variable warning and nil-to-integer
conversion exception.  Both cases result in object allocation
during GC, and crashes.
2020-02-13 12:46:48 +09:00
卜部昌平
f83781c8c1 rb_enc_str_asciionly_p expects T_STRING
This `str2` variable can be non-string (regexp etc.) but the previous
code passed it directly to rb_enc_str_asciionly_p(), which expects its
argument be a string.  Let's enforce that constraint.
2020-02-10 12:19:30 +09:00