archive/ruby - Eplg Git: Free And Private Git Hosting

mirror of https://github.com/ruby/ruby.git synced 2025-08-15 13:39:04 +02:00

Author	SHA1	Message	Date
Satoshi Tagomori	382645d440	namespace on read	2025-05-11 23:32:50 +09:00
Jean Boussier	7116b0a7f1	Extract `rb_shape_free_all`	2025-05-09 10:22:51 +02:00
Jean Boussier	0ea210d1ea	Rename `ivptr` -> `fields`, `next_iv_index` -> `next_field_index` Ivars will longer be the only thing stored inline via shapes, so keeping the `iv_index` and `ivptr` names would be confusing. Instance variables won't be the only thing stored inline via shapes, so keeping the `ivptr` name would be confusing. `field` encompass anything that can be stored in a VALUE array. Similarly, `gen_ivtbl` becomes `gen_fields_tbl`.	2025-05-08 07:58:05 +02:00
Jean Boussier	3ec7bfff2e	Use a `set_table` for `rb_vm_struct.unused_block_warning_table` Now that we have a hash-set implementation we can use that instead of a hash-table with a static value.	2025-04-27 11:59:28 +02:00
刘皓	45e814d116	Fix jump buffer leak in WASI builds	2025-04-27 15:47:30 +09:00
Takashi Kokubun	8b72e07359	Disable ZJIT profiling at call-threshold (https://github.com/Shopify/zjit/pull/99 ) * Disable ZJIT profiling at call-threshold * Stop referencing ZJIT instructions in codegen	2025-04-18 21:53:01 +09:00
Takashi Kokubun	2915806820	Add --zjit-num-profiles option (https://github.com/Shopify/zjit/pull/98 ) * Add --zjit-profile-interval option * Fix min to max * Avoid rewriting instructions for --zjit-call-threshold=1 * Rename the option to --zjit-num-profiles	2025-04-18 21:53:01 +09:00
Takashi Kokubun	bb46bb781c	Stub Init_builtin_zjit for --disable-zjit	2025-04-18 21:53:00 +09:00
Takashi Kokubun	14253e7d12	Implement Insn::Param using the SP register (https://github.com/Shopify/zjit/pull/39 )	2025-04-18 21:52:59 +09:00
Takashi Kokubun	22c73f1ccb	Implement FixnumAdd and stub PatchPoint/GuardType (https://github.com/Shopify/zjit/pull/30 ) * Implement FixnumAdd and stub PatchPoint/GuardType Co-authored-by: Max Bernstein <max.bernstein@shopify.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> * Clone Target for arm64 * Use $create instead of use create Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> * Fix misindentation from suggested changes * Drop an unneeded variable for mut * Load operand into a register only if necessary --------- Co-authored-by: Max Bernstein <max.bernstein@shopify.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>	2025-04-18 21:52:59 +09:00
Takashi Kokubun	0a543daf15	Add zjit_* instructions to profile the interpreter (https://github.com/Shopify/zjit/pull/16 ) * Add zjit_* instructions to profile the interpreter * Rename FixnumPlus to FixnumAdd * Update a comment about Invalidate * Rename Guard to GuardType * Rename Invalidate to PatchPoint * Drop unneeded debug!() * Plan on profiling the types * Use the output of GuardType as type refined outputs	2025-04-18 21:52:59 +09:00
Takashi Kokubun	53bee25068	Implement --zjit-call-threshold As a preparation for introducing a profiling layer, we need to be able to raise the threshold to run a few cycles for profiling.	2025-04-18 21:52:58 +09:00
Takashi Kokubun	06d875b979	Backport the latest jit_compile()	2025-04-18 21:52:58 +09:00
Alan Wu	1d95139bf6	`miniruby --zjit -e nil` runs through iseq_to_ssa	2025-04-18 21:52:56 +09:00
Takashi Kokubun	0bb709718b	Hook ZJIT compilation	2025-04-18 21:52:56 +09:00
John Hawthorn	57b6a7503f	Lock-free hash set for fstrings [Feature #21268 ] This implements a hash set which is wait-free for lookup and lock-free for insert (unless resizing) to use for fstring de-duplication. As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of fstrings (frozen interned strings) can significantly reduce the parallelism of Ractors. I tried a few other approaches first: using an RWLock, striping a series of RWlocks (partitioning the hash N-ways to reduce lock contention), and putting a cache in front of it. All of these improved the situation, but were unsatisfying as all still required locks for writes (and granular locks are awkward, since we run the risk of needing to reach a vm barrier) and this table is somewhat write-heavy. My main reference for this was Cliff Click's talk on a lock free hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It turns out this lock-free hash set is made easier to implement by a few properties: * We only need a hash set rather than a hash table (we only need keys, not values), and so the full entry can be written as a single VALUE * As a set we only need lookup/insert/delete, no update * Delete is only run inside GC so does not need to be atomic (It could be made concurrent) * I use rb_vm_barrier for the (rare) table rebuilds (It could be made concurrent) We VM lock (but don't require other threads to stop) for table rebuilds, as those are rare * The conservative garbage collector makes deferred replication easy, using a T_DATA object Another benefits of having a table specific to fstrings is that we compare by value on lookup/insert, but by identity on delete, as we only want to remove the exact string which is being freed. This is faster and provides a second way to avoid the race condition in https://bugs.ruby-lang.org/issues/21172. This is a pretty standard open-addressing hash table with quadratic probing. Similar to our existing st_table or id_table. Deletes (which happen on GC) replace existing keys with a tombstone, which is the only type of update which can occur. Tombstones are only cleared out on resize. Unlike st_table, the VALUEs are stored in the hash table itself (st_table's bins) rather than as a compact index. This avoids an extra pointer dereference and is possible because we don't need to preserve insertion order. The table targets a load factor of 2 (it is enlarged once it is half full).	2025-04-18 13:03:54 +09:00
Aaron Patterson	3628e9e30d	Remove unused field on Thread struct It looks like stat_insn_usage was introduced with YARV, but as far as I can tell the field has never been used. I think we should remove the field since we don't use it.	2025-04-11 10:28:26 -07:00
lukeg	d80f3a287c	Ractor.make_shareable(proc_obj) makes inner structure shareable Proc objects are now traversed like other objects when making them shareable. Fixes [Bug #19372] Fixes [Bug #19374]	2025-03-26 16:05:02 -07:00
Alan Wu	08b3a45bc9	Push a real iseq in rb_vm_push_frame_fname() Previously, vm_make_env_each() (used during proc creation and for the debug inspector C API) picked up the non-GC-allocated iseq that rb_vm_push_frame_fname() creates, which led to a SEGV when the GC tried to mark the non GC object. Put a real iseq imemo instead. Speed should be about the same since the old code also did a imemo allocation and a malloc allocation. Real iseq allows ironing out the special-casing of dummy frames in rb_execution_context_mark() and rb_execution_context_update(). A check is added to RubyVM::ISeq#eval, though, to stop attempts to run dummy iseqs. [Bug #21180] Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2025-03-12 15:00:26 -04:00
Yusuke Endoh	993fd96ce6	reject numbered parameters from Binding#local_variables Also, Binding#local_variable_get and #local_variable_set rejects an access to numbered parameters. [Bug #20965] [Bug #21049]	2025-02-18 16:23:24 +09:00
Takashi Kokubun	c1ce3d719d	Streamline YJIT checks on jit_compile()	2025-02-14 10:40:10 -08:00
Nobuyoshi Nakada	4a67ef09cc	[Feature #21116 ] Extract RJIT as a third-party gem	2025-02-13 18:01:03 +09:00
Aaron Patterson	d680a13ad0	Always return jit_entry even if NULL We can just always return the jit_entry since it will be initialized to NULL. There is no reason to specifically return NULL if yjit / rjit are disabled	2025-02-10 15:50:23 -05:00
Peter Zhu	5032791330	Fix conversion of RubyVM::FrozenCore to T_ICLASS We shouldn't directly set the flags of an object because there could be other flags set that would be erased. Instead, we can unset T_MASK and set T_ICLASS isntead.	2025-01-30 10:10:48 -05:00
Peter Zhu	98b36f6f36	Use rb_gc_vm_weak_table_foreach for reference updating We can use rb_gc_vm_weak_table_foreach for reference updating of weak tables in the default GC.	2025-01-27 10:28:36 -05:00
Nobuyoshi Nakada	f7059af50a	Use no-inline version `rb_current_ec` on Arm64 The TLS across .so issue seems related to Arm64, but not Darwin.	2025-01-17 22:48:10 +09:00
Peter Zhu	707c6420b1	Don't reference update frames with VM_FRAME_MAGIC_DUMMY Frames with VM_FRAME_MAGIC_DUMMY pushed by rb_vm_push_frame_fname have allocated iseq, so we should not reference update it.	2024-12-17 11:03:38 -05:00
Peter Zhu	92dd9734a9	Fix use-after-free in ep in Proc#dup for ifunc procs [Bug #20950] ifunc proc has the ep allocated in the cfunc_proc_t which is the data of the TypedData object. If an ifunc proc is duplicated, the ep points to the ep of the source object. If the source object is freed, then the ep of the duplicated object now points to a freed memory region. If we try to use the ep we could crash. For example, the following script crashes: p = { a: 1 }.to_proc 100.times do p = p.dup GC.start p.call rescue ArgumentError end This commit changes ifunc proc to also duplicate the ep when it is duplicated.	2024-12-13 10:10:03 -05:00
Randy Stauner	b021f6f8a7	Use symbol.h in vm.c to get macro for faster ID to sym (#12272 ) The macro provided by symbol.h uses STATIC_ID2SYM when it can which speeds up methods that declare keyword args. Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Co-authored-by: Takashi Kokubun (k0kubun) <takashikkbn@gmail.com> Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2024-12-05 17:51:32 -05:00
Yusuke Endoh	59f7a5d336	Remove meaningless NULL checks In this context, `th` must not be NULL	2024-12-04 12:15:23 +09:00
Randy Stauner	1dd40ec18a	Optimize instructions when creating an array just to call `include?` (#12123 ) * Add opt_duparray_send insn to skip the allocation on `#include?` If the method isn't going to modify the array we don't need to copy it. This avoids the allocation / array copy for things like `[:a, :b].include?(x)`. This adds a BOP for include? and tracks redefinition for it on Array. Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * YJIT: Implement opt_duparray_send include_p Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * Update opt_newarray_send to support simple forms of include?(arg) Similar to opt_duparray_send but for non-static arrays. * YJIT: Implement opt_newarray_send include_p --------- Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com>	2024-11-26 14:31:08 -05:00
Nobuyoshi Nakada	c129e01125	Make `rb_ec_set_vm_stack` conformant to the C++11 requirement Https://learn.microsoft.com/en-us/cpp/build/reference/zc-inline-remove-unreferenced-comdat?view=msvc-140 > If `/Zc:inline` is specified, the compiler enforces the C++11 > requirement that all functions declared inline must have a definition > available in the same translation unit if they're used.	2024-11-17 19:56:02 +09:00
Peter Zhu	1d1c80e644	Fix false-positive memory leak using Valgrind in YJIT (#12057 ) When we run with RUBY_FREE_AT_EXIT, there's a false-positive memory leak reported in YJIT because the METHOD_CODEGEN_TABLE is never freed. This commit adds rb_yjit_free_at_exit that is called at shutdown when RUBY_FREE_AT_EXIT is set. Reported memory leak: ==699816== 1,104 bytes in 1 blocks are possibly lost in loss record 1 of 1 ==699816== at 0x484680F: malloc (vg_replace_malloc.c:446) ==699816== by 0x155B3E: UnknownInlinedFun (unix.rs:14) ==699816== by 0x155B3E: UnknownInlinedFun (stats.rs:36) ==699816== by 0x155B3E: UnknownInlinedFun (stats.rs:27) ==699816== by 0x155B3E: alloc (alloc.rs:98) ==699816== by 0x155B3E: alloc_impl (alloc.rs:181) ==699816== by 0x155B3E: allocate (alloc.rs:241) ==699816== by 0x155B3E: do_alloc<alloc::alloc::Global> (alloc.rs:15) ==699816== by 0x155B3E: new_uninitialized<alloc::alloc::Global> (mod.rs:1750) ==699816== by 0x155B3E: fallible_with_capacity<alloc::alloc::Global> (mod.rs:1788) ==699816== by 0x155B3E: prepare_resize<alloc::alloc::Global> (mod.rs:2864) ==699816== by 0x155B3E: resize_inner<alloc::alloc::Global> (mod.rs:3060) ==699816== by 0x155B3E: reserve_rehash_inner<alloc::alloc::Global> (mod.rs:2950) ==699816== by 0x155B3E: hashbrown::raw::RawTable<T,A>::reserve_rehash (mod.rs:1231) ==699816== by 0x5BC39F: UnknownInlinedFun (mod.rs:1179) ==699816== by 0x5BC39F: find_or_find_insert_slot<(usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, const yjit::cruby::autogened::rb_callinfo, const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool), alloc::alloc::Global, hashbrown::map::equivalent_key::{closure_env#0}<usize, usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, const yjit::cruby::autogened::rb_callinfo, const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool>, hashbrown::map::make_hasher::{closure_env#0}<usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, const yjit::cruby::autogened::rb_callinfo, const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool, std:#️⃣:random::RandomState>> (mod.rs:1413) ==699816== by 0x5BC39F: hashbrown::map::HashMap<K,V,S,A>::insert (map.rs:1754) ==699816== by 0x57C5C6: insert<usize, fn(&mut yjit::codegen::JITState, &mut yjit::backend::ir::Assembler, const yjit::cruby::autogened::rb_callinfo, const yjit::cruby::autogened::rb_callable_method_entry_struct, core::option::Option<yjit::codegen::BlockHandler>, i32, core::option::Option<yjit::cruby::VALUE>) -> bool, std:#️⃣:random::RandomState> (map.rs:1104) ==699816== by 0x57C5C6: yjit::codegen::reg_method_codegen (codegen.rs:10521) ==699816== by 0x57C295: yjit::codegen::yjit_reg_method_codegen_fns (codegen.rs:10464) ==699816== by 0x5C6B07: rb_yjit_init (yjit.rs:40) ==699816== by 0x393723: ruby_opt_init (ruby.c:1820) ==699816== by 0x393723: ruby_opt_init (ruby.c:1767) ==699816== by 0x3957D4: prism_script (ruby.c:2215) ==699816== by 0x3957D4: process_options (ruby.c:2538) ==699816== by 0x396065: ruby_process_options (ruby.c:3166) ==699816== by 0x236E56: ruby_options (eval.c:117) ==699816== by 0x15BAED: rb_main (main.c:43) ==699816== by 0x15BAED: main (main.c:62) After this patch, there are no more memory leaks reported when running RUBY_FREE_AT_EXIT with Valgrind on an empty Ruby script: $ RUBY_FREE_AT_EXIT=1 valgrind --leak-check=full ruby -e "" ... ==700357== HEAP SUMMARY: ==700357== in use at exit: 0 bytes in 0 blocks ==700357== total heap usage: 36,559 allocs, 36,559 frees, 6,064,783 bytes allocated ==700357== ==700357== All heap blocks were freed -- no leaks are possible	2024-11-11 20:45:11 +00:00
Koichi Sasada	aa63699d10	support `require` in non-main Ractors Many libraries should be loaded on the main ractor because of setting constants with unshareable objects and so on. This patch allows to call `requore` on non-main Ractors by asking the main ractor to call `require` on it. The calling ractor waits for the result of `require` from the main ractor. If the `require` call failed with some reasons, an exception objects will be deliverred from the main ractor to the calling ractor if it is copy-able. Same on `require_relative` and `require` by `autoload`. Now `Ractor.new{pp obj}` works well (the first call of `pp` requires `pp` library implicitly). [Feature #20627]	2024-11-08 18:02:46 +09:00
Koichi Sasada	c8297c3eed	`interrupt_exec` introduce - rb_threadptr_interrupt_exec - rb_ractor_interrupt_exec to intercept the thread/ractor execution.	2024-11-08 18:02:46 +09:00
Koichi Sasada	ab7ab9e450	`Warning[:strict_unused_block]` to show unused block warning strictly. ```ruby class C def f = nil end class D def f = yield end [C.new, D.new].each{\|obj\| obj.f{}} ``` In this case, `D#f` accepts a block. However `C#f` doesn't accept a block. There are some cases passing a block with `obj.f{}` where `obj` is `C` or `D`. To avoid warnings on such cases, "unused block warning" will be warned only if there is not same name which accepts a block. On the above example, `C.new.f{}` doesn't show any warnings because there is a same name `D#f` which accepts a block. We call this default behavior as "relax mode". `strict_unused_block` new warning category changes from "relax mode" to "strict mode", we don't check same name methods and `C.new.f{}` will be warned. [Feature #15554]	2024-11-06 11:06:18 +09:00
Takashi Kokubun	478e0fc710	YJIT: Replace Array#each only when YJIT is enabled (#11955 ) * YJIT: Replace Array#each only when YJIT is enabled * Add comments about BUILTIN_ATTR_C_TRACE * Make Ruby Array#each available with --yjit as well * Fix all paths that expect a C location * Use method_basic_definition_p to detect patches * Copy a comment about C_TRACE flag to compilers * Rephrase a comment about add_yjit_hook * Give METHOD_ENTRY_BASIC flag to Array#each * Add --yjit-c-builtin option * Allow inconsistent source_location in test-spec * Refactor a check of BUILTIN_ATTR_C_TRACE * Set METHOD_ENTRY_BASIC without touching vm->running	2024-11-04 11:14:28 -05:00
Peter Zhu	645a0c9ea7	Remove vm_assert_env	2024-10-31 13:52:24 -04:00
Nobuyoshi Nakada	abfefd8e0c	Define `VM_ASSERT_TYPE` macros	2024-10-31 22:12:16 +09:00
Takashi Kokubun	9838c443c4	Make builtin init ifdefs consistent	2024-10-25 17:46:49 -07:00
Nobuyoshi Nakada	3e1021b144	Make default parser enum and define getter/setter	2024-10-02 20:43:40 +09:00
Étienne Barrié	bf9879791a	Optimized instruction for Hash#freeze If a Hash which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_hash_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-09-05 12:46:02 +02:00
Étienne Barrié	a99707cd9c	Optimized instruction for Array#freeze If an Array which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_ary_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org>	2024-09-05 12:46:02 +02:00
Peter Zhu	3c63a01295	Move responsibility of heap walking into Ruby This commit removes the need for the GC implementation to implement heap walking and instead Ruby will implement it.	2024-09-03 10:05:38 -04:00
Alan Wu	f2ac013009	Add RB_DEFAULT_PARSER preprocessor macro This way there is one place to change for switching the default. This also allows for building the same commit with different cppflags.	2024-08-27 23:15:37 +00:00
Alan Wu	057c53f771	Make rb_vm_invoke_bmethod() static	2024-08-07 19:17:31 -04:00
Your Name	34715bdd91	Tune codegen for rb_yield() calls landing in ISeqs Unlike in older revisions in the year, GCC 11 isn't inlining the call to vm_push_frame() inside invoke_iseq_block_from_c() anymore. We do want it to be inlined since rb_yield() speed is fairly important. Logs from -fopt-info-optimized-inline reveal that GCC was blowing its code size budget inlining invoke_block_from_c_bh() into its various callers, leaving suboptimal code for its body. Take away some uses of the `inline` keyword and merge a common tail call to vm_exec() for overall better code. This tweak gives about 18% on a micro benchmark and 1% on the chunky-png benchmark from yjit-bench. I tested on a Skylake server. ``` $ cat c-to-ruby-call.yml benchmark: - 0.upto(10_000_000) {} $ benchmark-driver --chruby '+patch;master' c-to-ruby-call.yml Warming up -------------------------------------- 0.upto(10_000_000) {} 2.299 i/s - 3.000 times in 1.304689s (434.90ms/i) Calculating ------------------------------------- +patch master 0.upto(10_000_000) {} 2.299 1.943 i/s - 6.000 times in 2.609393s 3.088353s Comparison: 0.upto(10_000_000) {} +patch: 2.3 i/s master: 1.9 i/s - 1.18x slower $ ruby run_benchmarks.rb --chruby 'master;+patch' chunky-png <snip> ---------- ----------- ---------- ----------- ---------- -------------- ------------- bench master (ms) stddev (%) +patch (ms) stddev (%) +patch 1st itr master/+patch chunky-png 1156.1 0.1 1142.2 0.2 1.01 1.01 ---------- ----------- ---------- ----------- ---------- -------------- ------------- ```	2024-08-07 18:49:20 -04:00
Yusuke Endoh	ac5ac48a36	Revert `28a1c4f33e` `28a1c4f33e` seems to call an improper ensure clause. [Bug #20655] Than fixing it properly, I bet it would be much better to simply revert that commit. It reduces the unneeded complexity. Jumping into a block called by a C function like Hash#each with callcc is user's fault. It does not need serious support.	2024-07-30 15:31:24 +09:00
Aaron Patterson	cdf33ed5f3	Optimized forwarding callers and callees This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls. Calls it optimizes look like this: ```ruby def bar(a) = a def foo(...) = bar(...) # optimized foo(123) ``` ```ruby def bar(a) = a def foo(...) = bar(1, 2, ...) # optimized foo(123) ``` ```ruby def bar(a) = a def foo(...) list = [1, 2] bar(list, ...) # optimized end foo(123) ``` All variants of the above but using `super` are also optimized, including a bare super like this: ```ruby def foo(...) super end ``` This patch eliminates intermediate allocations made when calling methods that accept `...`. We can observe allocation elimination like this: ```ruby def m x = GC.stat(:total_allocated_objects) yield GC.stat(:total_allocated_objects) - x end def bar(a) = a def foo(...) = bar(...) def test m { foo(123) } end test p test # allocates 1 object on master, but 0 objects with this patch ``` ```ruby def bar(a, b:) = a + b def foo(...) = bar(...) def test m { foo(1, b: 2) } end test p test # allocates 2 objects on master, but 0 objects with this patch ``` How does it work? ----------------- This patch works by using a dynamic stack size when passing forwarded parameters to callees. The caller's info object (known as the "CI") contains the stack size of the parameters, so we pass the CI object itself as a parameter to the callee. When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee. The CI at the forwarded call site is adjusted using information from the caller's CI. I think this description is kind of confusing, so let's walk through an example with code. ```ruby def delegatee(a, b) = a + b def delegator(...) delegatee(...) # CI2 (FORWARDING) end def caller delegator(1, 2) # CI1 (argc: 2) end ``` Before we call the delegator method, the stack looks like this: ``` Executing Line \| Code \| Stack ---------------+---------------------------------------+-------- 1\| def delegatee(a, b) = a + b \| self 2\| \| 1 3\| def delegator(...) \| 2 4\| # \| 5\| delegatee(...) # CI2 (FORWARDING) \| 6\| end \| 7\| \| 8\| def caller \| -> 9\| delegator(1, 2) # CI1 (argc: 2) \| 10\| end \| ``` The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in to `delegator`, it writes `CI1` on to the stack as a local variable for the `delegator` method. The `delegator` method has a special local called `...` that holds the caller's CI object. Here is the ISeq disasm fo `delegator`: ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL\|FORWARDING>, nil 0006 leave [Re] ``` The local called `...` will contain the caller's CI: CI1. Here is the stack when we enter `delegator`: ``` Executing Line \| Code \| Stack ---------------+---------------------------------------+-------- 1\| def delegatee(a, b) = a + b \| self 2\| \| 1 3\| def delegator(...) \| 2 -> 4\| # \| CI1 (argc: 2) 5\| delegatee(...) # CI2 (FORWARDING) \| cref_or_me 6\| end \| specval 7\| \| type 8\| def caller \| 9\| delegator(1, 2) # CI1 (argc: 2) \| 10\| end \| ``` The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to memcopy the caller's stack before calling `delegatee`. In this case, it will memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much memory to copy from the caller because `CI1` contains stack size information (argc: 2). Before executing the `send` instruction, we push `...` on the stack. The `send` instruction pops `...`, and because it is tagged with `FORWARDING`, it knows to memcopy (using the information in the CI it just popped): ``` == disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)> local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1]) [ 1] "..."@0 0000 putself ( 1)[LiCa] 0001 getlocal_WC_0 "..."@0 0003 send <calldata!mid:delegatee, argc:0, FCALL\|FORWARDING>, nil 0006 leave [Re] ``` Instruction 001 puts the caller's CI on the stack. `send` is tagged with FORWARDING, so it reads the CI and _copies_ the callers stack to this stack: ``` Executing Line \| Code \| Stack ---------------+---------------------------------------+-------- 1\| def delegatee(a, b) = a + b \| self 2\| \| 1 3\| def delegator(...) \| 2 4\| # \| CI1 (argc: 2) -> 5\| delegatee(...) # CI2 (FORWARDING) \| cref_or_me 6\| end \| specval 7\| \| type 8\| def caller \| self 9\| delegator(1, 2) # CI1 (argc: 2) \| 1 10\| end \| 2 ``` The "FORWARDING" call site combines information from CI1 with CI2 in order to support passing other values in addition to the `...` value, as well as perfectly forward splat args, kwargs, etc. Since we're able to copy the stack from `caller` in to `delegator`'s stack, we can avoid allocating objects. I want to do this to eliminate object allocations for delegate methods. My long term goal is to implement `Class#new` in Ruby and it uses `...`. I was able to implement `Class#new` in Ruby [here](https://github.com/ruby/ruby/pull/9289). If we adopt the technique in this patch, then we can optimize allocating objects that take keyword parameters for `initialize`. For example, this code will allocate 2 objects: one for `SomeObject`, and one for the kwargs: ```ruby SomeObject.new(foo: 1) ``` If we combine this technique, plus implement `Class#new` in Ruby, then we can reduce allocations for this common operation. Co-Authored-By: John Hawthorn <john@hawthorn.email> Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>	2024-06-18 09:28:25 -07:00
Nobuyoshi Nakada	49fcd33e13	Introduce a specialize instruction for Array#pack Instructions for this code: ```ruby # frozen_string_literal: true [a].pack("C") ``` Before this commit: ``` == disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)> 0000 putself ( 3)[Li] 0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL\|VCALL\|ARGS_SIMPLE> 0003 newarray 1 0005 putobject "C" 0007 opt_send_without_block <calldata!mid:pack, argc:1, ARGS_SIMPLE> 0009 leave ``` After this commit: ``` == disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)> 0000 putself ( 3)[Li] 0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL\|VCALL\|ARGS_SIMPLE> 0003 putobject "C" 0005 opt_newarray_send 2, :pack 0008 leave ``` Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com> Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>	2024-05-23 12:11:50 -07:00

1 2 3 4 5 ...

1232 commits