How MJIT Generates C From Ruby - A Deep Dive

You probably already know the basics of JIT in Ruby. CRuby’s JIT implementation, called MJIT, is a really interesting beast.

But what does the C code actually look like? How is it generated? What are all the specifics?

If you’re afraid of looking at C code, this may be a good week to skip this blog. I’m just sayin’.

How Ruby Runs Your Code

I’ll give you the short version here: Ruby parses your code. It turns it into an Abstract Syntax Tree, which is just a tree-data-structure version of the operations you asked it to do. Before Ruby 1.9, Ruby would directly interpret the tree structure to run your code. Current Ruby (1.9 through 2.6-ish) translates it into buffers of bytecodes. These buffers are called ISEQs, for “Instruction SEQuences.” There are various tools like yomikomu that will let you dump, load and generally examine ISEQs. BootSnap, the now-standard tool to optimize startup for large Rails apps, works partly by loading dumped ISEQs instead of parsing all your code from .rb files.

Also, have I talked up Pat Shaughnessy’s Ruby Under a Microscope lately? He explains all of this in massive detail. If you’re a Ruby-internals geek (guilty!) this is an amazing book. It’s also surprising how little Ruby’s internals have changed since he wrote it.

In the Ruby source code, there’s a file full of definitions for all the instructions that go into ISEQs. You can look up trivial examples like the optimized plus operator and see how they work. Ruby actually doesn’t call these directly - the source file is written in a weird, not-exactly-C syntax that gets taken apart and used in multiple ways. You can think of it as a C DSL if you like. For the “normal” Ruby interpreter, they all wind up in a giant loop which looks up the next operation in the ISEQ, runs the appropriate instructions for it and then loops again to look up the next instruction (and so on.)

A Ruby build script generates the interpreter’s giant loop as a C source file when you build Ruby. It winds up built into your normal ruby binary.

Ruby’s MJIT uses the same file of definitions to generate C code from Ruby. MJIT can take an ISEQ and generate all the lines of C it would run in that loop without actually needing the loop or the instruction lookup. If you’re a compiler geek, yeah, this is a bit like loop unrolling since we already know the instruction sequence that the loop would be operating on. So we can just “spell out” the loop explicitly. That also lets the C compiler see where operations would be useless or cancel each other out and just skip them. That’s hard to do in an interpreter!

So what does all this actually look like when Ruby does it?

MJIT Options and Seeing Inside

It turns out that MJIT has some options that let us see behind the curtain. If you have Ruby 2.6 or higher then you have JIT available. Run “ruby —help” and you can see MJIT’s extra options on the command line. Here’s what I see in 2.6.2 (note that some options are changing for not-yet-released 2.7):

JIT options (experimental):
  --jit-warnings  Enable printing JIT warnings
  --jit-debug     Enable JIT debugging (very slow)
  --jit-wait      Wait until JIT compilation is finished everytime (for testing)
  --jit-save-temps
                  Save JIT temporary files in $TMP or /tmp (for testing)
  --jit-verbose=num
                  Print JIT logs of level num or less to stderr (default: 0)
  --jit-max-cache=num
                  Max number of methods to be JIT-ed in a cache (default: 1000)
  --jit-min-calls=num
                  Number of calls to trigger JIT (for testing, default: 5)

Most of these aren’t a big deal. Debugging and warnings can be useful, but they’re not thrilling. But “—jit-save-temps” there may look intriguing to you… I know it did to me!

That will actually save the C source files that Ruby is using and we can see inside them!

If you do this, you may want to set the environment variables TMP or TMPDIR to a directory where you want them - OS X often puts temp files in weird places. I added an extra print statement to mjit_worker.c in the function “convert_unit_to_func” right after “sprint_uniq_filename” so that I could see when it created a new file… But that means messing around in your Ruby source, so you do you.

Multiplication and Combinatorics

# multiply.rb
def multiply(a, b)
  a * b
end

1_000_000.times do
  multiply(7.0, 10.0)
end

I decided to start with really simple Ruby code. MJIT will only JIT a method, so you need a method. And then you need to call it, preferably a lot of times. So the code on the right is what I came up with. It is intentionally not complicated.

The “multiply” method multies two numbers and does nothing else. It gets JITted because it’s called many, many times. I ran this code with “ruby —jit —jit-save-temps multiply.rb”, which worked fine for me once I figured out where MacOS was putting its temp files.

The resulting .c file generated by Ruby is 236 lines. Whether you find this astoundingly big or pretty darn small depends a lot on your background. Let me show you a few of the highlights from that file.

Here is a (very) cut-down and modified version:

// Generated by MJIT from multiply.rb
ALWAYS_INLINE(static VALUE _mjit_inlined_6(...));
static inline VALUE
_mjit_inlined_6(rb_execution_context_t *ec, rb_control_frame_t *reg_cfp, const VALUE orig_self, const rb_iseq_t *original_iseq)
{
    // ...
}

VALUE
_mjit0(...)
{
    // ...
    label_6: /* opt_send_without_block */
    {
        // ...
        stack[0] = _mjit_inlined_6(ec, reg_cfp, orig_self, original_iseq);
    }
}

What I’m showing here is that there is an inlined _mjit_inlined_6 method (C calls them “functions”) that gets called by a top-level “mjit0” method, which is the MJIT-ted version of the “multiply” method in Ruby. “Inlined” means the C compiler effectively rewrites the code so that it’s not a called method - instead, the whole method’s code, all of it, gets pasted in where the method would have been called. It’s a bit faster than a normal function call. It also lets the compiler optimize it just for that one case, since the pasted-in code won’t be called by anything else. It’s pasted in at that one single call site.

If you look at the full code, you’ll also see that each method is full of “labels” and comments like the one above (“opt_send_without_block”). Below is basically all of the code to that inlined function. If you ignore the dubious indentation (generated code is generated), you have a chunk of C for each bytecode instruction and some setup, cleanup and stack-handling in between. The large “cancel” block at the end is all the error handling that is done if the method does not succeed.

The chunks of code at each label, by the way, are what the interpreter loop would normally do.

And if you examine these specific opcodes, you’ll discover that this is taking two local variables and multiplying them - this is the actual multiply method from the Ruby code above.

static inline VALUE
_mjit_inlined_6(rb_execution_context_t *ec, rb_control_frame_t *reg_cfp, const VALUE orig_self, const rb_iseq_t *orig
inal_iseq)
{
    const VALUE *orig_pc = reg_cfp->pc;
    const VALUE *orig_sp = reg_cfp->sp;
    VALUE stack[2];
    static const VALUE *const original_body_iseq = (VALUE *)0x7ff4cd51a080;

label_0: /* getlocal_WC_0 */
{
    MAYBE_UNUSED(VALUE) val;
    MAYBE_UNUSED(lindex_t) idx;
    MAYBE_UNUSED(rb_num_t) level;
    level = 0;
    idx = (lindex_t)0x4;
    {
        val = *(vm_get_ep(GET_EP(), level) - idx);
        RB_DEBUG_COUNTER_INC(lvar_get);
        (void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
    }
    stack[0] = val;
}

label_2: /* getlocal_WC_0 */
{
    MAYBE_UNUSED(VALUE) val;
    MAYBE_UNUSED(lindex_t) idx;
    MAYBE_UNUSED(rb_num_t) level;
    level = 0;
    idx = (lindex_t)0x3;
    {
        val = *(vm_get_ep(GET_EP(), level) - idx);
        RB_DEBUG_COUNTER_INC(lvar_get);
        (void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
    }
    stack[1] = val;
}

label_4: /* opt_mult */
{
    MAYBE_UNUSED(CALL_CACHE) cc;
    MAYBE_UNUSED(CALL_INFO) ci;
    MAYBE_UNUSED(VALUE) obj, recv, val;
    ci = (CALL_INFO)0x7ff4cd52b400;
    cc = (CALL_CACHE)0x7ff4cd5192e0;
    recv = stack[0];
    obj = stack[1];
    {
        val = vm_opt_mult(recv, obj);

        if (val == Qundef) {
            reg_cfp->sp = vm_base_ptr(reg_cfp) + 2;
            reg_cfp->pc = original_body_iseq + 4;
            RB_DEBUG_COUNTER_INC(mjit_cancel_opt_insn);
            goto cancel;
        }
    }
    stack[0] = val;
}

label_7: /* leave */
    return stack[0];

cancel:
    RB_DEBUG_COUNTER_INC(mjit_cancel);
    rb_mjit_iseq_compile_info(original_iseq->body)->disable_inlining = true;
    rb_mjit_recompile_iseq(original_iseq);
    const VALUE current_pc = reg_cfp->pc;
    const VALUE current_sp = reg_cfp->sp;
    reg_cfp->pc = orig_pc;
    reg_cfp->sp = orig_sp;

    struct rb_calling_info calling;
    calling.block_handler = VM_BLOCK_HANDLER_NONE;
    calling.argc = 2;
    calling.recv = reg_cfp->self;
    reg_cfp->self = orig_self;
    vm_call_iseq_setup_normal(ec, reg_cfp, &calling, (const rb_callable_method_entry_t *)0x7ff4cd930958, 0, 2, 2);

    reg_cfp = ec->cfp;
    reg_cfp->pc = current_pc;
    reg_cfp->sp = current_sp;
    *(vm_base_ptr(reg_cfp) + 0) = stack[0];
    *(vm_base_ptr(reg_cfp) + 1) = stack[1];
    return vm_exec(ec, ec->cfp);

} /* end of _mjit_inlined_6 */

The labels mark where a particular bytecode instruction in the ISEQ starts, and the name is the name of that bytecode instruction. This is doing nearly exactly what the Ruby interpreter would, including lots of Ruby bookkeeping for things like call stacks.

What Changes?

Okay. We’ve multiplied two numbers together. This is a single, small operation.

What changes if we do more?

Well… This is already a fairly long blog post. But first, I’ll link a repository of the output I got when multiplying more than two numbers.

And then after you clone that repo, you can start doing interesting things yourself to see what changes over time. For instance:

# See what's different between multiplying 2 Floats and multiplying 3 Floats
diff -c multiply_2_version_0.c multiply_3_version_0.c

And in fact, if we multiply three or more Floats, MJIT will realize it can improve some things over time. When multiplying three (or four!) Floats, it will produce three different chunks of C code, not just one, as it continues to iterate. So:

# See what's different between the first and second way to multiply three Floats
diff -c multiply_3_version_0.c multiply_3_version_1.c

I’ll let you have a look. When looking at diffs, keep in mind that the big hexadecimal numbers in the CALL_INFO and CALL_CACHE lines will change for every run, both in my output and in any output you make for yourself — they’re literally hardcoded memory addresses in Ruby, so they’re different for every run. But the other changes are often interesting and substantive, as MJIT figures out how to optimize things.

What Did We Learn?

I like to give you interesting insights, not just raw code dumps. So what’s interesting here?

Here’s one interesting thing: you don’t see any checks for whether operations like multiply are redefined. But that’s not because of excellent JIT optimization - it’s because that all lives inside the vm_opt_mult function call up above. At best, they might be recognized as a repeat check and the compiler might be able to tell that it doesn’t need to check them again. But that’s actually hard — there’s a lot of code here, and it’s hard to verify that none of it could possibly ever redefine an operation… Especially in Ruby!

So: MJIT is going to have a lot of trouble skipping those checks, given the way it structures this code.

And if it can’t skip those checks, it’s going to have a lot of trouble doing optimisations like constant folding, where it multiplies two numbers at compile time instead of every time through the loop. You and I both know that 7 * 10 will always be 70, every time through the loop because nobody is redefining Integer multiply halfway. But MJIT can’t really know that - what if there was a trace_func that redefined operations constantly? Or a background thread that redefined the operation halfway through? Ruby allows it!

To put it another way, MJIT isn’t doing a lot of interesting language-level optimisation here. Mostly it’s just optimising simple bookkeeping like the call stack and C-level function calls. Most of the Ruby operations, including overhead like checking if you redefined a function, stay basically the same.

That should make sense. Remember how MJIT got written and merged in record time? It’s very hard to make language-level optimizations without a chance of breaking something. MJIT tries not to change the language semantics at all. So it doesn’t make many changes or assumptions. So mostly, MJIT is a simple mechanical transform of what the interpreter was already doing.

If you didn’t already know what the Ruby interpreter was doing under the hood, this is also a fun look into that.