Commit graph

232 commits

Author SHA1 Message Date
Alan Wu
6e3790b17f YJIT: Fix mismatched_lifetime_syntaxes, new in Rust 1.89.0 2025-08-11 15:49:14 -04:00
Alan Wu
e5c7f1695e YJIT: x86: Fix panic writing 32-bit number with top bit set
Previously, `asm.mov(m32, imm32)` panicked when `imm32 > 0x80000000`. It
attempted to split imm32 into a register before doing the store, but
then the register size didn't match the destination size.

Instead of splitting, use the `MOV r/m32, imm32` form which works for
all 32-bit values. Adjust asserts that assumed that all forms undergo
sign extension, which is not true for this case.

See: 54edc930f9
2025-06-11 19:49:49 +09:00
Takashi Kokubun
bb91c303ba
YJIT: Rename get_temp_regs2() back to get_temp_regs() (#12866) 2025-03-06 10:52:49 -05:00
Alan Wu
5a7089fc03 YJIT: A64: Remove assert that trips when OOM at page boundary
With a well-timed OOM around a page switch in the backend, it can return
RetryOnNextPage twice and crash due to the assert. (More places can
signal OOM now since VirtualMem tracks Rust malloc heap size for
--yjit-mem-size.)

Return error in these cases instead of crashing.

Fixes: https://github.com/Shopify/ruby/issues/566
2025-01-29 19:09:39 -05:00
Takashi Kokubun
cff031253f
YJIT: Spill/load argument registers to reuse blocks (#12287)
* YJIT: Spill/load argument registers to reuse blocks

* Mention the immediate function name

* Explain the context behind spill/load operations
2024-12-09 10:02:40 -08:00
Takashi Kokubun
5b129c899a
YJIT: Pass method arguments using registers (#11280)
* YJIT: Pass method arguments using registers

* s/at_current_insn/at_compile_target/

* Implement register shuffle
2024-08-27 17:04:43 -07:00
Takashi Kokubun
2de8b5b805
YJIT: Allow dev_nodebug to disasm release-mode code (#11198)
* YJIT: Allow dev_nodebug to disasm release-mode code

* Revert "YJIT: Squash canary before falling back"

This reverts commit f05ad373d8.
The stray canary issue should have been solved by
def7023ee4, alleviating this codegen
accommodation.

* s/runtime_assertions/runtime_checks/

---------

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2024-07-18 13:01:47 -07:00
Takashi Kokubun
ec773e15f4
YJIT: Local variable register allocation (#11157)
* YJIT: Local variable register allocation

* locals are not stack temps

* Rename RegTemps to RegMappings

* Rename RegMapping to RegOpnd

* Rename local_size to num_locals

* s/stack value/operand/

* Rename spill_temps() to spill_regs()

* Clarify when num_locals becomes None

* Mention that InsnOut uses different registers

* Rename get_reg_mapping to get_reg_opnd

* Resurrect --yjit-temp-regs capability

* Use MAX_CTX_TEMPS and MAX_CTX_LOCALS
2024-07-15 10:56:57 -04:00
Alan Wu
3be9ce3cf6
YJIT: dump-disasm: Print comments and bytes in release builds
This change implements a fallback mode for the `--yjit-dump-disasm`
development command-line option to make it usable in release builds.
Previously, using the option with release builds of YJIT yielded only
a warning asking the user to build with `--enable-yjit=dev`.

While builds that use the `disasm` feature still give the best output,
just having the comments is useful enough for many kinds of debugging.
Having it usable in release builds is nice for new hackers, too, since
this allows for tinkering without having to learn how to build YJIT in
development mode.

Sample output on A64:

```
  # regenerate_branch
  # Insn: 0001 opt_send_without_block (stack_size: 1)
  # guard known object with singleton class
  0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00
  0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5
  # RUBY_VM_CHECK_INTS(ec)
  0x11f7e0050: 8b 02 42 b8 cb 07 01 35
  # stack overflow check
  0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54
  # save PC to CFP
  0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00
  0x11f7e0073: f8 ab 82 00 91
```

To ensure this feature doesn't incur too much cost when running without
the `--yjit-dump-disasm` option, I checked that there is no significant
impact to compile time and memory usage with the `compile_time_ns` and
`yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each
sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The
statistics summary and done with the `summary` function in R.

Compile time, sample size of 60, lower is better:

```
       Before              After
 Min.   :2.054e+09   Min.   :2.028e+09
 1st Qu.:2.069e+09   1st Qu.:2.044e+09
 Median :2.081e+09   Median :2.060e+09
 Mean   :2.089e+09   Mean   :2.066e+09
 3rd Qu.:2.109e+09   3rd Qu.:2.085e+09
 Max.   :2.146e+09   Max.   :2.144e+09
```

Allocation size, sample size of 20, lower is better:

```
       Before             After
 Min.   :21804742   Min.   :21794082
 1st Qu.:21826682   1st Qu.:21816282
 Median :21844042   Median :21826814
 Mean   :21960664   Mean   :22026291
 3rd Qu.:21861228   3rd Qu.:22040439
 Max.   :22587426   Max.   :22930614
```

The `yjit_alloc_size` samples are noisy, but since the average increased
by only 0.3%, and the median is lower, I feel safe saying that there is
no significant change.
2024-07-08 20:02:30 +00:00
Alan Wu
8b81301536
YJIT: A64: Use CBZ/CBNZ to check for zero
* YJIT: A64: Add CBZ and CBNZ encoding functions

* YJIT: A64: Use CBZ/CBNZ to check for zero

Instead of emitting `cmp x0, #0` plus `b.z #target`, A64 offers Compare
and Branch on Zero for us to just do `cbz x0, #target`. This commit
utilizes that and the related CBNZ instruction when appropriate.

We check for zero most commonly in interrupt checks:

```diff
  # Insn: 0003 leave (stack_size: 1)
  # RUBY_VM_CHECK_INTS(ec)
  ldur w11, [x20, #0x20]
  -tst w11, w11
  -b.ne #0x109002164
  +cbnz w11, #0x1049021d0
```

* fix copy paste error

Co-authored-by: Randy Stauner <randy@r4s6.net>

---------

Co-authored-by: Randy Stauner <randy@r4s6.net>
2024-04-17 21:48:38 +00:00
Alan Wu
2eafed0f3b
YJIT: A64: Avoid intermediate register in opt_and and friends (#10509)
Same idea as the x64 equivalent in c2622b5253, removing the register
shuffle coming from the pop two, push one stack motion these VM
instructions perform.

```
  # Insn: 0004 opt_or (stack_size: 2)
  - orr x11, x1, x9
  - mov x1, x11
  + orr x1, x1, x9
```
2024-04-15 11:59:45 -04:00
Alan Wu
c2622b5253
YJIT: x64: Remove register shuffle with opt_and and friends (#10498)
This is best understood by looking at the change to the output:

```diff
  # Insn: 0002 opt_and (stack_size: 2)
  - mov rax, rsi
  - and rax, rdi
  - mov rsi, rax
  + and rsi, rdi
```

It's a bit awkward to match against due to how stack operands are
lowered, but hey, it's nice to save the 2 unnecessary MOVs.
2024-04-11 10:37:56 -04:00
Alan Wu
3c4de946c9
YJIT: A64: Use ADDS/SUBS/CMP (immediate) when possible (#10402)
* YJIT: A64: Use ADDS/SUBS/CMP (immediate) when possible

We were loading 1 into a register and then doing ADDS/SUBS previously.
That was particularly bad since those come up in fixnum operations.

```diff
   # integer left shift with rhs=1
-  mov x11, #1
-  subs x11, x1, x11
+  subs x11, x1, #1
   lsl x12, x11, #1
   asr x13, x12, #1
   cmp x13, x11
-  b.ne #0x106ab60f8
-  mov x11, #1
-  adds x12, x12, x11
+  b.ne #0x10903a0f8
+  adds x12, x12, #1
   mov x1, x12
```

Note that it's fine to cast between i64 and u64 since the bit pattern is
preserved, and the add/sub themselves don't care about the signedness of
the operands.

CMP is just another mnemonic for SUBS.

* YJIT: A64: Split asm.mul() with immediates properly

There is in fact no MUL on A64 that takes an immediate, so this
instruction was using the wrong split method. No current usages of this
form in YJIT.

---------

Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
2024-04-02 12:29:14 -04:00
Takashi Kokubun
35f68e7dff
YJIT: Assert Opnd::Stack's SP expectation (#10061) 2024-02-21 23:09:17 +00:00
Takashi Kokubun
577d07cfc6
YJIT: Allow non-leaf calls on opt_* insns (#10033)
* YJIT: Allow non-leaf calls on opt_* insns

* s/on_send_insn/is_sendish/

* Repeat known_cfunc_codegen
2024-02-21 20:24:18 +00:00
Takashi Kokubun
9216a2ac43
YJIT: Verify the assumption of leaf C calls (#10002) 2024-02-20 13:42:29 -08:00
Takashi Kokubun
5cbca9110c
YJIT: Allow tracing a counted exit (#9890)
* YJIT: Allow tracing a counted exit

* Avoid clobbering caller-saved registers
2024-02-08 15:47:02 -08:00
Maxime Chevalier-Boisvert
5a87e9e2b5
YJIT: add missing jge comparison instruction (#9819)
I ran into this while trying to implement setbyte, was surprised
to find out we hadn't implemented it yet.
2024-02-02 17:09:31 -05:00
Maxime Chevalier-Boisvert
adf29c9a98
YJIT: add asm comment when we clear local types (#9713)
Small PR to add a comment when we clear local variable types,
so we can be aware that it's happening when looking at the disasm.
2024-01-29 15:36:34 +00:00
Takashi Kokubun
e0f7cee8c5
YJIT: Avoid doubly splitting Opnd::Value on CSel (#9617)
YJIT: Avoid doubly splitting Opnd::Value
2024-01-19 11:51:35 -08:00
Takashi Kokubun
33306a08d1
YJIT: Stop incrementing chain_depth on defer_compilation (#9597) 2024-01-18 11:40:11 -08:00
Hiroshi SHIBATA
863ded45a1
Typofix under bootstraptest, spec and yjit directories 2023-12-25 13:50:23 +09:00
Takashi Kokubun
476a231e7e
YJIT: Assert no patch overlap on pos_marker (#9048) 2023-11-28 10:41:14 -05:00
Takashi Kokubun
95369ac0a3
YJIT: Fix jmp_ptr_bytes on x86_64 (#9016) 2023-11-23 10:50:42 -05:00
Takashi Kokubun
926bfc3bc0
YJIT: Avoid a register spill on arm64 (#9014) 2023-11-22 15:13:32 -08:00
Alan Wu
0a93ea4808 YJIT: Auto fix for clippy::clone_on_copy 2023-11-10 16:55:56 -05:00
Alan Wu
38fe710e08 YJIT: Invoke PosMarker callbacks only with solid positions
Previously, PosMarker callbacks ran even when the assembler failed to
assemble its contents due to insufficient space. This was problematic
because when Assembler::compile() failed, the callbacks were given
positions that have no valid code, contrary to general expectation.

For example, we use a PosMarker callback to record VM instruction
boundaries and patch in jumps to exits in case the guest program starts
tracing, however, previously, we could record a location near the end of
the code block, where there is no space to patch in jumps. I suspect
this is the cause of the recent occurrences of rare random failures on
GitHub Actions with the invariants.rs:529 "can rewrite existing code"
message. `--yjit-perf` also uses PosMarker and had a similar issue.

Buffer the list of callbacks to fire, and only fire them when all code
in the assembler are written out successfully. It's more intuitive this
way.
2023-11-10 11:51:05 -05:00
Alan Wu
a1c61f0ae5 YJIT: Use u32 for CodePtr to save 4 bytes each
We've long had a size restriction on the code memory region such that a
u32 could refer to everything. This commit capitalizes on this
restriction by shrinking the size of `CodePtr` to be 4 bytes from 8.

To derive a full raw pointer from a `CodePtr`, one needs a base pointer.
Both `CodeBlock` and `VirtualMemory` can be used for this purpose. The
base pointer is readily available everywhere, except for in the case of
the `jit_return` "branch". Generalize lea_label() to lea_jump_target()
in the IR to delay deriving the `jit_return` address until `compile()`,
when the base pointer is available.

On railsbench, this yields roughly a 1% reduction to `yjit_alloc_size`
(58,397,765 to 57,742,248).
2023-11-07 17:43:43 -05:00
Alan Wu
38bdb9d0da YJIT: Delete some dead code and enable lints 2023-11-03 18:47:41 +00:00
Alan Wu
cdc2a18541 YJIT: Return Option from asm.compile() for has_dropped_bytes()
So that we get a reminder to check CodeBlock::has_dropped_bytes().
Internally, asm.compile() already checks it, and this patch just
propagates it out to the caller with a `#[must_use]`.

Code GC logic moved out one level in entry_stub_hit(), so the body
can freely use `?`
2023-10-19 14:56:35 -04:00
Alan Wu
9d9aa63e82 YJIT: Enable the dead_code lint and delete some dead code 2023-10-19 11:50:36 -04:00
Takashi Kokubun
f51b92fe23
YJIT: Add --yjit-perf (#8697)
Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2023-10-18 21:07:03 +00:00
Alan Wu
36ee5d8ca8 YJIT: Fix clippy::redundant_locals
> note: `#[deny(clippy::redundant_locals)]` on by default

On Rust 1.73.0.
2023-10-17 18:36:23 -04:00
Alan Wu
41a6e4bdf9 YJIT: Avoid writing return value to memory in leave
Previously, at the end of `leave` we did
`*caller_cfp->sp = return_value`, like the interpreter.
With future changes that leaves the SP field uninitialized for C frames,
this will become problematic. For cases like returning from
`rb_funcall()`, the return value was written above the stack and
never read anyway (callers use the copy in the return register).

Leave the return value in a register at the end of `leave` and have the
code at `cfp->jit_return` decide what to do with it. This avoids the
unnecessary memory write mentioned above. For JIT-to-JIT returns, it goes
through `asm.stack_push()` and benefits from register allocation for
stack temporaries.

Mostly flat on benchmarks, with maybe some marginal speed improvements.

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2023-10-05 15:53:05 -04:00
Takashi Kokubun
0b67e3fd3e
YJIT: Chain-guard opt_mult overflow (#8554)
* YJIT: Chain-guard opt_mult overflow

* YJIT: Support regenerating Jo after Mul
2023-09-29 21:55:48 -04:00
Takashi Kokubun
9aeb6e72db
YJIT: Avoid creating a vector in get_temp_regs() (#8446)
* YJIT: Avoid creating a vector in get_temp_regs()

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

* Remove unused import

---------

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2023-09-15 21:41:00 -04:00
Alan Wu
0996cf5593 YJIT: Fix and enable the unused_imports warning 2023-09-15 16:15:15 -04:00
Takashi Kokubun
982d6503b9
YJIT: Skip Insn::Comment and format! if disasm is disabled (#8441)
* YJIT: Skip Insn::Comment and format!

if disasm is disabled

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>

* YJIT: Get rid of asm.comment

---------

Co-authored-by: Alan Wu <alansi.xingwu@shopify.com>
2023-09-14 15:49:40 -07:00
Takashi Kokubun
fcc1699162
YJIT: Initialize Vec with capacity for iterators (#8439) 2023-09-14 10:55:00 -07:00
Takashi Kokubun
cdc69da9e5
YJIT: Initialize Assembler vectors with capacity (#8437) 2023-09-14 10:10:31 -04:00
Alan Wu
ff55238913
YJIT: x64: Split mem-to-mem Insn::Store like Insn::Mov
The ARM backend allows for this so let's make x64 consistent.
2023-08-22 18:43:56 -04:00
Maxime Chevalier-Boisvert
314eed8a5e
YJIT: implement fast path for integer multiplication in opt_mult (#8204)
* YJIT: implement fast path for integer multiplication in opt_mult

* Update yjit/src/codegen.rs

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>

* Implement mul with overflow checking on arm64

* Fix missing semicolon

* Add arm splitting for lshift, rshift, urshift

---------

Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
2023-08-18 10:05:32 -04:00
Maxime Chevalier-Boisvert
a8cd18f08d
YJIT: implement codegen for rb_int_lshift (#8201)
* YJIT: implement codegen for rb_int_lshift

* Update yjit/src/asm/x86_64/mod.rs

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>

---------

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2023-08-11 11:01:16 -04:00
Maxime Chevalier-Boisvert
b5b34c1f84
YJIT: add mul() instruction to backend IR (#8195) 2023-08-10 14:47:03 -04:00
Maxime Chevalier-Boisvert
fc0b2a8df2
YJIT: guard for array_len >= num in expandarray (#8169)
Avoid generating long dispatch chains for all array lengths seen.
2023-08-04 10:09:43 -04:00
Maxime Chevalier-Boisvert
4f99240b2e
YJIT: add jb (unsigned less-than) instruction to backend (#8168) 2023-08-03 16:14:44 -04:00
Maxime Chevalier-Boisvert
98b4256aa7
YJIT: handle expandarray_rhs_too_small case (#8161)
* YJIT: handle expandarray_rhs_too_small case

YJIT: fix csel bug in x86 backend, add test

Remove commented out lines

Refactor expandarray to use chain guards

Propagate Type::Nil when known

Update yjit/src/codegen.rs

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>

* Add missing counter, use get_array_ptr() in expandarray

* Make change suggested by Kokubun to reuse loop

---------

Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
2023-08-03 16:09:18 -04:00
Hiroshi SHIBATA
fd782dcd1e
Revert "YJIT: implement expandarray_rhs_too_small case (#8153)"
This reverts commit 3b88a0bee8.

  This commit break aarch64 platform and Apple Silicon
2023-08-02 14:25:16 +09:00
Maxime Chevalier-Boisvert
3b88a0bee8
YJIT: implement expandarray_rhs_too_small case (#8153)
* YJIT: handle expandarray_rhs_too_small case

* YJIT: fix csel bug in x86 backend, add test

* Remove commented out lines
2023-08-01 15:58:00 -04:00
Takashi Kokubun
bde4080b39
YJIT: Drop Copy trait from Context (#8138) 2023-07-29 21:14:04 -04:00