Adding Error Productions to the Lox Compiler

Reading Time: 11 minutes

I’m working my way through Crafting Interpreters. The project guides programmers through building their own interpreters for the Lox programming language. I started writing a blog series about my progress. You can see all the posts so far right here.

Lately I’ve been doing some of the book’s exercises. I wrote here about its exercise to bring an object-based pattern to a functional language. But most of the exercises focus on making changes to the Lox compiler. You can check out my solutions to those in pull requests against my fork of the repo. As of this writing you can see my solutions for adding block comment syntax, ternary syntax, and comma operator syntax to Lox.

You can check out each of my exercise tests and solutions at github.com/chelseatroy/craftinginterpreters/pulls.

Though the exercises are meant to be done on the partially complete compiler that the reader has at each point in the book, I have so far exhibited the gall (or maybe the foolishness) to implement them on the finished compiler, so you can see how they’d work in a complete programming language! In case you’d like to try any of these exercises yourself, I’ve attempted to make each feature’s automated tests a separate branch from the solution, so you can check out the tests branch and code against that :).

In one case so far I have failed to separate the tests and implementation. That case is the implementation of error productions.

What is an error production?

Sometimes a compiler’s conversion of frontend code to an abstract syntax tree works perfectly. More often it doesn’t, because the frontend code contains syntax errors or unfinished code. The ideal parse error should happen, among other things:

  • soon enough that it’s easy to spot and fix
  • not so soon (read: constantly) as to annoy the programmer
  • with enough information to fix the problem
  • without rendering all the downstream code un-parseable such that the editor window becomes a living hell of red highlighting (looking at you, Swift/XCode).

That last one is a real drag. So far we’ve avoided such an event in Lox with a good ol’ exception: we raise the first parse error we run into, which effectively stops the parsing process, so the parseability of the remaining code isn’t even checked.

Another option, which I’ll show you today, is to use an error production: that is, account for common syntax errors as part of the parse tree such that the parser doesn’t choke on them. The nice thing about this is that, since the parser didn’t choke on the first one, we can present (ideally) all of the issues to the programmer at once rather than forcing them to fix them one by one and wait in anticipation after each fix to figure out what else they did wrong.

The error production challenge in Crafting Interpreters reads:

3. Add error productions to handle each binary operator appearing without a left-hand operand. In other words, detect a binary operator appearing at the beginning of an expression. Report that as an error, but also parse and discard a right-hand operand with the appropriate precedence.

Chapter 6, Exercise 3, Crafting Interpreters

As I am wont to do from several years of Labs training, I began my endeavors with a test:

The syntax is a bit cryptic, but what happens is that the test runner runs the Lox program and then checks for the error described in that line above the code that looks like a comment.

To make error productions, I added two constructs.

One is in the Lox grammar: Expr.Nothing represents the absence of an expression where there is supposed to be one.

We can use this to parse lines of code like * 4;, which would multiply the left operand by four except that there is no left operand. It takes in a string, but that’s not because it needs one; it’s because the GenerateAST class in the Lox compiler doesn’t like generating a node that takes no arguments, and I wanted to stay focused on these error productions rather than go mess with the AST generator. So when I initialize that expression…well…

I told you the argument was a hack, okay?

Once I had replaced the exception thrown in primary() with my shiny new Expr.Nothing expression at the point where it would catch a missing operand in my binary expressions, I hopped over to a spot in the code where the operands of a binary expression get parsed—term()—and checked the type of the return value for the left operand.

I extracted this helper function after the fact: for a while, I just had the behavior right on this line. Here’s what checkForMissingExpression looks like:

So when the parsing step has hands me an Expr.Nothing, I create my other new construct: HandledParseError. This complements the existing ParseError, which I renamed to UnahandledParseError.

HandledParseError takes a reference to the token where the error happened (to later report where it was) and a message for the programmer (so they know what’s wrong). I add that shiny new instance to a list of handledParseErrors that lives on the parser object itself.

Once parsing is complete, the Parser reports an error for each item in this list:

It looks beautiful, right?

So at this point I run my tests. They pass!

A whoooole bunch of other tests fail.

Why? Because it turns out that the Parser was leaning pretty heavily on that “Expect expression” exception.

So we have two options: one option, perhaps the sensible option, is to stop here and call this a proof of concept.

I didn’t choose that option.

Bring on the errors!!

To be candid, the errors that a lot of these failing tests were highlighting turned out to be not all that specific to begin with. For example, this lox code also surfaced “Error: Expect expression”:

It’s telling us that .123 is not an expression. Specifically, you can’t start an expression with a period like this in Lox. But if you don’t know that and you think you’ve written a perfectly valid decimal expression here, this error message is a little…uh…

What if it were this instead?

So let’s add an error production for that. The first step is to identify where this error comes from, and it turns out that this expression gets parsed in primary() because it’s a number. So we add a check for a dot at the beginning of the expression and surface a helpful message about it.

This is another little helper function that I pulled out. Like the other one, I didn’t do this until later on in the effort, and for a while the implementation I’m about to show you was just lying here in primary() in all its verbose glory.

It’s not that verbose; I just like to rib people who rag on Java for its verbosity.

This function, instead of checking for Expr.Nothing, checks whether a token you pass in matches a specific type that you also pass in, and generates an error production at that token with a passed-in message if it does.

I ended up using it in a few other places throughout the Parser to surface more helpful error messages for other parse errors that failed tests once I started messing with the binary operators. For example, these error messages needed a face lift:

You’ll also notice that the updated tests check for additional error messages that the original ones don’t check for. This is because error productions allow parsing to continue past a parse error, such that if the code has additional parse errors, the parser will catch those, too.

Et voilà, helper function to the rescue:

So far, we are missing some low-hanging fruits.

Specifically, we implemented an error production to capture the absence of a left-hand operand for addition and subtraction operations, but not for multiplication, division, or any of the comparison operators like >, <=, or ==, or the logical operators or and and.

But we’ve done the hard part, so now we can step through the parser and add a line to the function that parses each of those types of expressions. I won’t put them all here as that would get repetitive, but here’s the one for inequality comparators:

And here’s one for a logical operator:

Our hard work from earlier is paying off. Have an apple, as a treat!

I added one more fun error production.

Lox has a print keyword like Python 2.X had, and it gets mad at you if you invoke that keyword without giving it something to print:

For the pedantic among you: yes, this is technically a statement and not an expression. Lovingly, shut up. We’re about to change this error message anyway.

Is that better? You prefer that? Good.

Implementing for this test offers us a second opportunity to use our helper function that checks for missing values:

I’ve done my level best to make this code approachable for you to try out some error productions of your own!

Feel free to use my examples to write your own tests, and feel free, of course, to use my helper functions. What common syntactic errors would you like to see Lox surface, and what would you like the error message to include?

Here is the branch to get you started. Here are also some ideas for getting your feet wet, in increasing order of difficulty based on what’s on the branch today:

  1. Add an error production to handle mathematical, comparison, or logical operators appearing without a right-hand operand.
  2. Add an error production to explain to a programmer that their variable name cannot be just numbers, or cannot start with a number.
  3. Add an error production to handle a typo in a keyword like pirnt, retrun, or flase. To do this, you’ll have to add a token type for your typo of choice!

For all of these, I recommend starting by writing a test and running it (make test_jlox, from the craftinginterpreters directory) to help you understand how the compiler handles these situations today.

That’s all I’ve got for you today. Hopefully you enjoyed!

If you liked this post, you might also like…

The rest of the Crafting Interpreters series, and specifically this piece from last year about handling errors with grace

This post about what pride means to me in practice (if you’re prepared to watch me talk about error productions for 2,000 words, you can survive 16 minutes of hearing why any of this matters to me)

This post about why use, or not use, an interface (specifically in Python because of the way Python does, or rather doesn’t exactly do, interfaces)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.