Combine call info and cache to speed up method invocation

To perform a regular method call, the VM needs two structs,
`rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
structures in separate buffers. In the worst case, the CPU needs to read
4 cache lines to complete a method call. Putting the two structures
together reduces the maximum number of cache line reads to 2.

Combining the structures also saves 8 bytes per call site as the current
layout uses separate two pointers for the call info and the call cache.
This saves about 2 MiB on Discourse.

This change improves the Optcarrot benchmark at least 3%. For more
details, see attached bugs.ruby-lang.org ticket.

Complications:
 - A new instruction attribute `comptime_sp_inc` is introduced to
 calculate SP increase at compile time without using call caches. At
 compile time, a `TS_CALLDATA` operand points to a call info struct, but
 at runtime, the same operand points to a call data struct. Instruction
 that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
 - MJIT code for copying call cache becomes slightly more complicated.
 - This changes the bytecode format, which might break existing tools.

[Misc #16258]
This commit is contained in:
Alan Wu 2019-07-30 21:36:05 -04:00 committed by 卜部昌平
parent 38e931fa2c
commit 89e7997622
Notes: git 2019-10-24 18:04:08 +09:00
17 changed files with 322 additions and 264 deletions

View file

@ -166,7 +166,7 @@ enum vm_regan_acttype {
#ifndef MJIT_HEADER
#define CALL_SIMPLE_METHOD() do { \
rb_snum_t x = leaf ? INSN_ATTR(width) : 0; \
rb_snum_t y = attr_width_opt_send_without_block(0, 0); \
rb_snum_t y = attr_width_opt_send_without_block(0); \
rb_snum_t z = x - y; \
ADD_PC(z); \
DISPATCH_ORIGINAL_INSN(opt_send_without_block); \