My Apple compiler started as an experiment in what I would implement as compiler for an array system. Several parts did not pan out so I offer my warnings and advice to other array language implementers.

Typed Array Programming

Apple combines a REPL and type inference

> :ty (⊲) (a → (Arr (i `Cons` Nil) a → Arr ((i + 1) `Cons` Nil) a))

This turns out to be quite good for the style of exploratory programming that J encourages. Such a type system is hardly new: the Apple type system is based on Repa and Accelerate, but they do not emphasize interactivity or terseness.

APL descendants look peculiar in light of their lack of static typing; a dialect that added shape types would immediately earn its place.

Compiler-as-a-Library

I wanted to have a function:

compile_src : String -> Bytes

which took source code to machine code.

With such a function one could compile to machine code on one machine and distribute it to others for execution, modify a text editor to execute code directly, or write a compiler plugin to extend GCC (say) with the ability to compile another language.

#include <stdio.h> #include <stdlib.h>

double ff(double) {apple| λz. { ffact ← [()/ₒ 1 (𝒻 1 x (⌊x))]; Σ ← λN.λa. (+)/ₒ 0 (a'(⍳ 0 N 1)); (2%√𝜋)Σ 30 (λn. {nf⟜ℝn; ((_1^n)z^(2n+1))%((ffact nf)(2nf+1))}) } |};

int main(int argc, char *argv[]) { printf("%f\n", ff(2.0)); }

However, this is not so easy—foreign function calls cannot be compiled to machine code and must bundle context information.

There appears to be little precedent in writing compilers that can be threaded through one another, but an array language would be a good candidate: one can write a full compiler that only needs foreign calls to malloc and free.

Embedding

When the compiler is structured as a library function, one can easily embed it in other languages. The hope was that this would make Apple a replacement for NumPy.

A embedded language can perform deforestation that NumPy cannot; consider

np.sum(np.arange(0,10000,1.0,dtype=np.int64))

This builds up an array in-memory and then folds over it. In Apple, the compiler can analyze the whole expression

(+)/o 0 (irange 0 9999 1)

and compile it to a loop with no allocations:

> :asm (+)/o 0 (irange 0 9999 1) sub rsp, 8 xor r10, r10 xor rax, rax mov rdx, 9999 apple_0: cmp r10, rdx jg apple_1 mov rcx, r10 add rax, rcx add r10, 1 jmp apple_0 apple_1: add rsp, 8 ret

Jupyter Notebooks

Jupyter notebooks are quite nice for calculations; the combination of interactivity and record-keeping is missing from the ML side of things (Futhark, Accelerate, Repa). Such languages have found great success in building systems but perhaps calculation has been forgotten.

JIT

Writing a full compiler backend was half the effort of writing the interpreter, at once perfunctory and fickle.

Unfortunately this was necessary: other array languages have found great success in tree-walking interpreters written in C, but this approach would not allow us to compile down to object files.

There is one advantage, which is that the compiler can be used from the REPL; one can inspect the assembly to see if SIMD instructions are used:

> :disasm [(⋉)Λₒ 0 (x::Arr (i `Cons` Nil) float)] ⋮ xor r10, r10 4d 31 d2 cmp r10, rcx 49 39 ca jge apple_1 0f 8d 1b 00 00 00 apple_0: movq [rax+8*r10+16], xmm3 66 4a 0f 7e 5c d0 10 vmaxsd xmm3, xmm3, [rdx+8*r10+16] c4 a1 e3 5f 5c d2 10 add r10, 1 49 83 c2 01 cmp r10, rcx 49 39 ca jl apple_0 0f 8c e5 ff ff ff apple_1: add rsp, 8 48 83 c4 08 ret c3

A backend-as-a-library would suit array languages well: calculations and array operations map nicely to assembly, and the array language community cares about SIMD instructions.

Unfortunately, LLVM is poorly documented and fast-moving, and GCC is simply not structured suitably.

IR

The Apple compiler only has one IR, going from expressions to jumps and addresses.

Other array languages have written successful interpreters in C; an IR like C (with for loops) would make translating rank less tortuous.