Thursday, November 7, 2019

Instant Builds

One of the goals I am aiming for in Objective-Smalltalk is instant builds and effective live programming.

A month ago, I got a package from an old school friend: my old Apple ][+, which I thought I had given as a gift, but he insisted had been a long-term loan. That machine featured 48KB of DRAM and a 1 MHz, 8 bit 6502 processor that took multiple cycles for even the simplest instructions, had no multiply instructions and almost no registers. Yet, when I turn it on it becomes interactive faster than the CRT warms up, and the programming experience remains fully interactive after that. I type something in, it executes. I change the program, type "RUN" and off it goes.

Of course, you can also get that experience with more complex systems, Smalltalk comes to mind, but the point is that it doesn't take the most advanced technology or heroic effort to make systems interactive, what it takes is making it a priority.


But here we are indeed.

Now Swift is only one example of this, it's a current trend, and of course these systems do claim that they provide benefits that are worth the wait. From optimizations to static type-checking with type-inference, so that "once it compiles, it works". This is deemed to be (a) 100% worthwhile despite the fact that there is no scientific evidence backing up these claims (a paper which claimed that it had the evidence was just shredded at this year's OOPSLA) and (b) essentially cost-free. But of course it isn't cost free:

So when everyone zigs, I zag, it's my contrarian nature. Where Swift's message was, essentially "there is too much Smalltalk in Objective-C", my contention is that there is too little Smalltalk in Objective-C (and also that there is too little "Objective" in Smalltalk, but that's a different topic).

Smalltalk was perfectly interactive in its own environment on high end late 70s and early 80s hardware. With today's monsters of computation, there is no good reason, or excuse for that matter, to not be interactive even when taken into the slightly more demanding Unix/macOS/iOS development world. That doesn't mean there aren't loads of reasons, they're just not any good.

So Objective-Smalltalk will be fast, it will be live or near-live at all times, and it will have instant builds. This isn't going to be rocket science, mostly, the ingredients are as follows:

  1. An interpreter
  2. Late binding
  3. Separate compilation
  4. A fast and simple native compiler
Let's look at these in detail.

An interpreter

The basic implementation of Objective-Smalltalk is an AST-walking interpreter. No JIT, not even a simple bytecode interpreter. Which is about as pessimal as possible, but our machines are so incredibly fast, and a lot of our tasks simple enough or computational steering enough that it actually does a decent enough job for many of those tasks. (For more on this dynamic, see The Death of Optimizing Compilers by Daniel J. Bernstein)

And because it is just an interpreter, it has no problems doing its thing on iOS:

(Yes, this is in the simulator, but it works the same on an actual device)

Late Binding

Late binding nicely decouples the parts of our software. This means that the compiler has very little information about what happens and can't help a lot in terms of optimization or checking, something that always drove the compiler folks a little nuts ("but we want to help and there's so much we could do"). It enables strong modularity and separate compilation. Objective-Smalltalk is as late-bound in its messaging as Objective-C or Smalltalk are, but goes beyond them by also late-binding identifiers, storage and dataflow with Polymorphic Identifiers (ACM, pdf), Storage Combinators (ACM, pdf) and Polymorphic Write Streams (ACM, pdf).

Allowing this level of flexibility while still not requiring a Graal-level Helden-JIT to burn away all the abstractions at runtime will require careful design of the meta-level boundaries, but I think the technically desirable boundaries align very well with the conceptually desirable boundaries: use meta-level facilities to define the language you want to program in, then write your program.

It's not making these boundaries clear and freely mixing meta-level and base-level programming that gets us in not just conceptual trouble, but also into the kinds of technical trouble that the Heldencompilers and Helden-JITs have to bail us out of.

Separate Compilation

When you have good module boundaries, you can get separate compilation, meaning a change in file (or other code-containing entity if you don't like files) does not require changes to other files. Smalltalk had this. Unix-style C programming had this, and the concept of binary libraries (with the generalization to frameworks on macOS etc.). For some reason, this has taken more and more of a back-seat in macOS and iOS development, with full source inclusion and full builds becoming the norm in the community (see CocoaPods) and for a long time being enforced by Apple by not allowing user-define dynamic libraries on iOS.

While Swift allows separate compilation, this can have such severe negative effects on both performance and compile times that compiling everything on any change has become a "best practice". In fact, we now have a build option "whole module optimization with optimizations turned off" for debugging. I kid you not.

Objective-Smalltalk is designed to enable "Framework-oriented-programming", so separate compilation is and will remain a top priority.

A fast and simple native compiler

However, even with an interpreter for interactive adjustments, separate compilation due to good modularity and late binding, you sometimes want to do a full build, or need to rebuild a large part of the codebase.

Even that shouldn't take forever, and in fact it doesn't need to. I am totally with Jonathan Blow on this subject when he says that compiling a medium size project shouldn't really more than a second or so.

My current approach for getting there is using TinyCC's backend as the starting point of the backend for Objective-Smalltalk. After all, the semantics are (mostly) Objective-C and Objective-C's semantics are just C. What I really like about tcc is that it goes so brutally directly to outputting CPU opcode as binary bytes.


static void gcall_or_jmp(int is_jmp)
{
    int r;
    if ((vtop->r & (VT_VALMASK | VT_LVAL)) == VT_CONST &&
	((vtop->r & VT_SYM) && (vtop->c.i-4) == (int)(vtop->c.i-4))) {
        /* constant symbolic case -> simple relocation */
        greloca(cur_text_section, vtop->sym, ind + 1, R_X86_64_PLT32, (int)(vtop->c.i-4));
        oad(0xe8 + is_jmp, 0); /* call/jmp im */
    } else {
        /* otherwise, indirect call */
        r = TREG_R11;
        load(r, vtop);
        o(0x41); /* REX */
        o(0xff); /* call/jmp *r */
        o(0xd0 + REG_VALUE(r) + (is_jmp << 4));
    }
}

No layers of malloc()ed intermediate representations here! This aligns very nicely with the streaming/messaging approach to high-performance I've taken elsewhere with Polymorphic Write Streams (see above), so I am pretty confident I can make this (a) work and (b) simple/elegant while keeping it (c) fast.

How fast? I obviously don't know yet, but tcc is a fantastic starting point. The following is the current (=wrong) ObjectiveTcc code to drive tcc to build a function that sends a single message:


-(void)generateMessageSendTestFunctionWithName:(char*)name
{
    SEL flagMsg=@selector(setMsgFlag);
    [self functionOnlyWithName:name returnType:VT_INT argTypes:"" body:^{
        [self pushFunctionPointer:objc_msgSend];
        [self pushObject:self];
        [self pushPointer:flagMsg];
        [self call:2];
    }];
}

How often can I do this in one second? On my 2018 high spec but 13" MBP: 300,000 times. Including in-memory linking (though not much of that happening in this example), not including Mach-O generation as that's not implemented yet and writing the whole shebang to disk. I don't anticipate either of these taking appreciably additional time.

If we consider this 2 "lines" of code, one for the function/method header and one for the message, then we can generate binary for 600KLOC/s. So having a medium size program compile and link in about a second or so seems eminently doable, even if I manage to slow the raw Tcc performance down by about an order of magnitude.

(For comparison: the Swift code base that motivated the Rome caching system for Carthage was clocking in at around 60 lines per second with the then Swift compiler. So even with an anticipated order of magnitude slowdown we'd still be 1000x faster. 1000x is good enough, it's the difference between 3 seconds and an hour.)

What's the downside? Tcc doesn't do a lot of optimization. But that's OK as (a) the sorts of optimizations C compilers and backends like LLVM do aren't much use for highly polymorphic and late-bound code and (b) the basics get you around 80% of the way (c) most code doesn't need that much optimization (see above) and (d) machines have become really fast.

And it helps that we aren't doing crazy things like initially allocating function-local variables on the heap or doing function argument copying via vtables that require require leaning on the optimizer to get adequate performance (as in: not 100x slower..).

Defense in Depth

While any of these techniques might be adequate some of the time, it's the combination that I think will make the Objective-Smalltalk tooling a refreshing, pleasant and highly productive alternative to existing toolchains, because it will be reliably fast under all circumstances.

And it doesn't really take (much) rocket science, just a willingness to make this aspect a priority.

1 comment:

Anonymous said...

thank you there, i find your articles about objective smalltalk highly interesting.

you are not the only one left cold by swift. for every step forward it takes two steps back. and many of the so-called improvements actually aren't useful, if you've been doing objective-c for 15 years. maybe it helps if you are new, or coming from java...