A response to 'A decade of developing a programming language'

I recently read the blog post A decade of developing a programming language by Yorick Peterse (found via Steve Klabnik). I thought it was an interesting blog post which got me thinking, and I have opinions on programming language design from Rust (it is almost exactly a decade since I got involved with Rust too), so I have written a response of sorts. This is all unsubstantiated opinion, so don't hold me to all this too hard, it just felt like a fun thing to write.

Avoid gradual typing

Yes. Back in my PhD days when gradual typing was an emerging thing in academia, I really bought into the hype. But I think Yorick is spot-on with his observation that by using gradual typing, you lose the benefits of static typing and you don't really get the benefits of dynamic typing.

The motivation for gradual typing is that you can use a language to prototype your project (or to sketch initial architectures, etc.), then evolve that using the same language into high-quality code with static types. But this is flawed in three ways:

  • there is huge benefit in throwing away the prototype, rather than evolving it into production code.
  • When writing 'quick and dirty' code, it's mostly not the types which slow you down (assuming the programmer is experienced, and the language has good inference and tooling). You move fast by leaving out edge cases, error handling, UI design, integrations, etc., etc. In fact, types can speed you up by facilitating better tooling and type-based design.
  • Static typing is not just bureaucratic error checking, types help define the 'culture' and flavour of the language. If you're writing code in a language minus its type system, either you follow the flavour of the language (where you're effectively constrained by its type system, you just don't get the automatic checking) or you ignore it and you're effectively writing in a different language and adding static types is going to require rewriting your code.

I also think that using different languages for different tasks is not a bad thing, and nobody should expect to use just one language any more. However, there are benefits to using a fewer languages, so this isn't a strong counter-argument.

I think there is a kernel of truth to the vision of gradual typing, which is that with writing and thinking about types can be a pain (either because they are not expressive enough or because they are too complex). I think 80% of the solution is good type inference (and other ways to elide types), and type inference should be table stakes for any new language. The other 20% comes down to good language design: balancing simplicity and expressivity as design goals of your type system.

The other element of truth from gradual typing is that different users of a language will have different constraints and requirements, and thus will use your language in different registers. E.g., in a Rust-like language, some users will want the level of detail we currently have, others will want clone to be implicit for Rc (and similar cases), still others will want more precision, e.g., guarantees that code doesn't panic or about memory usage. There are a few possible solutions:

  • you try to find a solution which makes everyone happy,
  • you accept that your language has a small niche and do what makes sense in that niche,
  • you support different registers/dialects either explicitly or by using some kind of gradual typing,
  • you fork the language into several dialects.

I don't think any of these are good solutions. The first is nice if it works, but it is limited. The others suck. I don't have an answer. I think it is one of the hard challenges in language design.

The killer app for gradual typing is something like Type Script - adding static types to a language which is dynamically typed. This is great, but I don't think this means gradual typing is good for a new language (Yorick also makes this point).

Avoid self-hosting your compiler

Yes! From a technical perspective, writing your compiler in your target language is just plain bad (Yorick covers this). The real selling point of self-hosting is social: people who want to work on your new language will want to write that language and they will get frustrated if they have to write the compiler in an older, worse language. It is also a bit of a milestone in the PL community when you can self-host, but this strikes me as unjustified nerdery.

The big downside I see with self-hosting is that it incentivises you to design a language optimised for writing compilers. Most code isn't compilers, so this incentive is probably not well-aligned with your goals.

Avoid writing your own code generator, linker, etc

I think this is mostly true. But I think that the bigger point is to know your vision and goals for your language and prioritise those. Perhaps a new linking model is one of the core goals and the primary selling point of your language, if so then you should definitely write your own linker. But for most languages, that won't be the case. Focus on your core features, and use existing tools for peripheral stuff, basically.

Avoid bike shedding about syntax

Disagree. This is a common sentiment among language designers, but I think it is wrong. Its tempting because the semantics are the really deep, interesting things, and the syntax is easy to argue about because it is fairly subjective and there is a lower barrier to entry in terms of required knowledge. But this doesn't mean syntax isn't important, it just means its hard to have a good discussion about it. Syntax is the interface between the user and your language, and like all user interfaces, good design is super, super important.

As a language designer, you need to figure out the syntax. Its hard work which is very different from designing the semantics, its more like product design than compiler hacking. As a community leader, you need to figure out how to have good discussions about syntax, which is probably even harder!

Cross-platform support is a challenge

Yeeeeessss, but! This one is very true, cross-platform is hard and often frustrating. However, my experience from Rust is that supporting multiple platforms often helps you make good design decisions (it helps you to determine what is essential vs what is accidental about a concept), and that adding platforms later is much, much harder than supporting them from the start.

For example, if we'd only supported 64bit platforms we might have used u64 instead of usize for things like array length. But making that separation was a good design decision from the perspective of types as documentation. On the other hand, usize conflates several concepts: addresses, data size, etc. This is more apparent now when considering platforms like Cheri.

Compiler books aren't worth the money

Mostly true. They certainly do seem to focus too much on parsing and not enough on the important stuff, especially difficult engineering concepts like error handling/recovery. My recommendation is 'Engineering a Compiler' by Cooper and Torczon; it covers the usual parsing stuff, but also type checking, intermediate representations, optimisation, etc. (not much on error handling, though).

Growing a language is hard

Yeah, really hard! Nothing to add to this one.

The best test suite is a real application

Sort of. One of the best things about working on the Rust compiler is its excellent test suite. I can't imagine writing a compiler without such a thing. BUT, that is a test suite for the compiler, not the language. The unit tests do help add some clarity in the details, but in terms of driving the language design, a large application is more useful. However, you do need to be careful about over-matching on a single application. Ideally you want, several large applications and a huge test suite, but that is a lot to ask for a language which is in active development.

Don't prioritize performance over functionality

This is good advice, but I think only as far as the usual 'avoid premature optimisation' goes. Performance is a feature; having a fast compiler is nice and having a slow one sucks. Making a slow compiler fast is really, really hard. So there is a trade-off rather than being black and white.

Building a language takes time

Yep, so much time!

Some of my own lessons

I think it would be fun and interesting to write down some of my own lessons learnt from Rust. That deserves a fair bit of thought and its own post. But just off the top of my head, here are a few presented without detail:

  • community is important and difficult.
  • Be very clear about your audience (potential users) and goals, use these to drive a strong vision (including saying 'no' a lot).
  • BUT these will change over time and that is ok.
  • A language is a dynamic project, not a journey toward a static goal. Your language and tooling must have mechanisms to evolve and you must design with backwards and forwards compatibility in mind.
  • The entropy of languages is towards complexity. You must dedicate effort to minimising complexity as your language evolves.
  • There's a lot more existing code in existing languages than new code in your new language: its important to consider how the two can interact (FFI, sharing VMs, etc.).
  • The culture of your language is important and must be built earlier than you expect (e.g., things like writing docs or tests, or how much to focus on performance). Doubly so for the culture of your community (being welcoming, valuing diversity, etc., but this comes back to the first point: community is hard).
  • Libraries and tooling matter just as much as the design of the language.
  • Don't try and do too much. Limit the number of areas where you innovate and rely mostly on ideas proven in existing languages (but that includes research/academic languages, not just widely used ones).
  • There's a lot of research out there and it can be really useful. You often have to do a lot of work to apply research ideas to your context though.

I could come up with more of these, and probably the above aren't the most useful or important, but they're what came to mind this evening.