This changeset reduces the generated binary of
vm_call_method_missing_body from 604 bytes to 532 bytes on my machine.
Should reduce GC pressure as well.
VM stack could overflow here. The condition is when a symbol is passed
to a block-taking method via &variable, and that symbol has never been
used for actual method names (thus yielding that results in calling
method_missing), and the VM stack is full (no single word left). This
is a once-in-a-blue-moon event. Yet there is a very tiny room of stack
overflow. We need to check that.
This commit changes the number of calls of MEMCPY from...
| send | &:sym
-------------------------|-------|-------
Symbol already interned | once | twice
Symbol not pinned yet | none | once
to:
| send | &:sym
-------------------------|-------|-------
Symbol already interned | once | none
Symbol not pinned yet | twice | once
So it sacrifices exceptional situation for normal path.
Symbol#to_proc and Object#send are closely related each other. Why not
share their implementations. By doing so we can skip recursive call of
vm_exec(), which could benefit for speed.
This changeset slightly speeds up on my machine.
Calculating -------------------------------------
before after
Optcarrot Lan_Master.nes 38.33488426546287 40.89825082589147 fps
40.91288557922081 41.48687465359386
40.96591995270991 41.98499064664184
41.20461943032173 43.67314690779162
42.38344888176518 44.02777536251875
43.43563728880915 44.88695892714136
43.88082889062643 45.11226186242523
This makes it possible for vm_invoke_block to pass its passed arguments
verbatimly to calling functions. Because they are tail-called the
function calls can be strength-recuced into indirect jumps, which is a
huge win.
These two function were almost identical, except in case of
T_STRING/T_FLOAT. Why not merge them into one, and let the difference be
handled in normal method calls (slowpath). This does not improve
runtime performance for me, but at least reduces for instance rb_eql_opt
from 653 bytes to 86 bytes on my machine, according to nm(1).
If a module has an origin, and that module is included in another
module or class, previously the iclass created for the module had
an origin pointer to the module's origin instead of the iclass's
origin.
Setting the origin pointer correctly requires using a stack, since
the origin iclass is not created until after the iclass itself.
Use a hidden ruby array to implement that stack.
Correctly assigning the origin pointers in the iclass caused a
use-after-free in GC. If a module with an origin is included
in a class, the iclass shares a method table with the module
and the iclass origin shares a method table with module origin.
Mark iclass origin with a flag that notes that even though the
iclass is an origin, it shares a method table, so the method table
should not be garbage collected. The shared method table will be
garbage collected when the module origin is garbage collected.
I've tested that this does not introduce a memory leak.
This change caused a VM assertion failure, which was traced to callable
method entries using the incorrect defined_class. Update
rb_vm_check_redefinition_opt_method and find_defined_class_by_owner
to treat iclass origins different than class origins to avoid this
issue.
This also includes a fix for Module#included_modules to skip
iclasses with origins.
Fixes [Bug #16736]
rb_callable_method_entry() creates ccs entry in cc_tbl, but this
code overwrite by insert newly created ccs and overwrote ccs never
freed.
[Bug #16900]
We started to use fastpath on invokesuper when a method is not refinements
since 5c27681813, but we shouldn't have used fastpath for attr_writer either.
`cc->aux_.attr_index` is for an actual receiver class, while we store
its superclass in `cc->klass` and therefore there's no way to properly
invalidate attr_writer's inline cache when it's called by super.
[Bug #16785]
I suspect the same bug also exists in attr_reader. I'll address that in
another commit.
Fastpath has not been used for invokesuper since it has set vm_call_super_method on every invocation.
Because it seems to be blocked only by refinements, try enabling fastpath on invokesuper when cme is not for refinements.
While this patch itself should be helpful for VM performance, a part of this patch's motivation is to unblock inlining invokesuper on JIT.
$ benchmark-driver -v --rbenv 'before;after' benchmark/vm2_super.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-11T15:19:58Z master a01bda5949) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-12T02:00:08Z invokesuper-fastpath c171984ee3) [x86_64-linux]
Calculating -------------------------------------
before after
vm2_super 20.031M 32.860M i/s - 6.000M times in 0.299534s 0.182593s
Comparison:
vm2_super
after: 32859885.2 i/s
before: 20031097.3 i/s - 1.64x slower
This changes the following warnings:
* warning: class variable access from toplevel
* warning: class variable @foo of D is overtaken by C
into RuntimeErrors. Handle defined?(@@foo) at toplevel
by returning nil instead of raising an exception (the previous
behavior warned before returning nil when defined? was used).
Refactor the specs to avoid the warnings even in older versions.
The specs were checking for the warnings, but the purpose of
the related specs as evidenced from their description is to
test for behavior, not for warnings.
Fixes [Bug #14541]
Previously, passing a keyword splat to a method always allocated
a hash on the caller side, and accepting arbitrary keywords in
a method allocated a separate hash on the callee side. Passing
explicit keywords to a method that accepted a keyword splat
did not allocate a hash on the caller side, but resulted in two
hashes allocated on the callee side.
This commit makes passing a single keyword splat to a method not
allocate a hash on the caller side. Passing multiple keyword
splats or a mix of explicit keywords and a keyword splat still
generates a hash on the caller side. On the callee side,
if arbitrary keywords are not accepted, it does not allocate a
hash. If arbitrary keywords are accepted, it will allocate a
hash, but this commit uses a callinfo flag to indicate whether
the caller already allocated a hash, and if so, the callee can
use the passed hash without duplicating it. So this commit
should make it so that a maximum of a single hash is allocated
during method calls.
To set the callinfo flag appropriately, method call argument
compilation checks if only a single keyword splat is given.
If only one keyword splat is given, the VM_CALL_KW_SPLAT_MUT
callinfo flag is not set, since in that case the keyword
splat is passed directly and not mutable. If more than one
splat is used, a new hash needs to be generated on the caller
side, and in that case the callinfo flag is set, indicating
the keyword splat is mutable by the callee.
In compile_hash, used for both hash and keyword argument
compilation, if compiling keyword arguments and only a
single keyword splat is used, pass the argument directly.
On the caller side, in vm_args.c, the callinfo flag needs to
be recognized and handled. Because the keyword splat
argument may not be a hash, it needs to be converted to a
hash first if not. Then, unless the callinfo flag is set,
the hash needs to be duplicated. The temporary copy of the
callinfo flag, kw_flag, is updated if a hash was duplicated,
to prevent the need to duplicate it again. If we are
converting to a hash or duplicating a hash, we need to update
the argument array, which can including duplicating the
positional splat array if one was passed. CALLER_SETUP_ARG
and a couple other places needs to be modified to handle
similar issues for other types of calls.
This includes fairly comprehensive tests for different ways
keywords are handled internally, checking that you get equal
results but that keyword splats on the caller side result in
distinct objects for keyword rest parameters.
Included are benchmarks for keyword argument calls.
Brief results when compiled without optimization:
def kw(a: 1) a end
def kws(**kw) kw end
h = {a: 1}
kw(a: 1) # about same
kw(**h) # 2.37x faster
kws(a: 1) # 1.30x faster
kws(**h) # 2.19x faster
kw(a: 1, **h) # 1.03x slower
kw(**h, **h) # about same
kws(a: 1, **h) # 1.16x faster
kws(**h, **h) # 1.14x faster
send() has special method launcher in VM and it has special
method_missing caller. This path doesn't set
ec->method_missing_reason which is used at exception creation,
so setup this information. Without this setting, NoMethodError
exception becomes NameError.
This patch will fix:
http://ci.rvm.jp/results/trunk-random1@phosphorus-docker/2761643
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.