Summing Integer Ranges With dc

January 10, 2018

A while back I learned a curious mathematical fact: 666 is the sum of the first 36 natural numbers. Formally,

$$\sum_{i=1}^{37} i = 666$$

I'm a dutiful servant of the Dark Lord, so I made my Twitter bio the Python equivalent:

sum(range(37))

This code is straightforward and easily understood. The range(37) expression creates a generator yielding the numbers from 0 to 36, and sum() of course sums them. But, it's a tad banal for my taste.

Somehow I got the idea to reproduce the series as a program written for [dc](https://en.wikipedia.org/wiki/Dc_(computer_program)). If you're fortunate enough not to be familiar with dc, it's a "desk calculator" program that's old enough to predate Unix and C. It's also programmable, by way of an esoteric macro language.

Here's the simplest dc program I could come up with to compute the series:

# Sum the numbers 1 to 36 (inclusive).
dc -e '36[d1-d1<F+]dsFxp'

How It Works

The dc -e invocation just means that the code to evaluate is passed as the next argument (instead of reading input interactively from stdin). So the only part we need to worry about is the actual dc code itself, which is:

36[d1-d1<F+]dsFxp

Since dc is a reverse-polish calculator, the program works with a stack of values, and operators that manipulate the stack. As you might guess, the first expression (literally 36) pushes 36 onto the stack. Also as you might guess, this is pushed onto the stack because we intend to count down from that value when accumulating.

The next part of the expression is the bracketed term [d1-d1<F+]. Anything between square brackets is considered to be a string literal. As is often the case in dc programs, the string literal is itself a dc expression that will be evaluated later as a macro; we'll revisit this in a moment. For now, just know that a bracketed expression pushes that string value onto the stack. After parsing this term, the stack has two values: the integer 36 followed by the string d1-d1<F+.

Next is the character d. This simply duplicates the value on the top of the stack. After encountering this term, the stack contains 36 and then the string d1-d1<F+ twice.

dc has 256 registers, each named by a single character. The s command pops the top of the stack into a register named by the next character, which is F in this case. So after evaluating this term, our string d1-d1<F+ will be in the F register (and also still on top of the stack). Note that by convention I've chosen to use a register whose name doesn't collide with a dc command, but that's not a requirement. Mercifully, I didn't use d register, which would have yielded the equivalent (but even more obfuscated) program 36[d1-d1<d+]dsdxp.

The x term is next, and it evaluates the macro that is currently on the top of the stack. More on this in a moment, since the final term is simple enough: p just prints the top value on the stack. Which brings us back to x: what happens when it's evaluated?

The Macro

The macro, which as you should recall has also been stored into the F register, is the heart of the program. It looks arcane, but it's pretty simple.

We've already encountered d before: it duplicates the value on top of the stack. The next expression is the literal 1, which simply pushes the value 1 on top of the stack. Therefore the first time the macro is evaluated after reaching the first d1 the stack will contain: 36, 36, 1.

The - term pops the top two values from the stack, calculates the subtraction, and pushes that value onto the stack. In this case that will be 36 - 1, so after evaluating - the stack will contain: 36, 35.

Now we get another d1 term, which once again duplicates the top value and then pushes 1. The stack will now contain: 36, 35, 35, 1.

The term <F pops the two values at the top of the stack and then evaluates the macro if the less-than condition is satisfied. In this case 1 is less than 35, so F will be invoked recursively. When F is invoked this time, the stack contains: 36, 35.

The final term is + which adds two top two stack values. But before the addition can happen, F has to be evaluated again.

The first time through the loop we started with 36 on the stack and ended with 36, 35. As you can imagine, the next time through the stack ends up as 36, 35, 34. This continues until the base condition is hit, at which case the stack will contain the values from the full range, with 36 at the bottom of the stack and 1 at the top.

After the base condition is hit the stack unwinds, with + being evaluated for each pair of terms. This is a "right" fold through the stack, somewhat like foldr in Haskell or reduce() in Python 2 (which is actually a "left" fold).

Observations

After stumbling across arcane code, one hopes that there's some purpose behind it. Sometimes code is obtuse because it has inherent complexity. Other times there's a performance argument for writing code in a strange way (or using a strange language).

None of those apply here. dc is just senseless violence. In fact, the dc program here is less efficient than a straightforward loop written in basically any other programming language. Consider the equivalent C program written using a for loop:

int sum = 0;
for (int i = 1; i < 37; i++)
    sum += i;

An expression like this has $O(n)$ running time, but only $O(1)$ memory/storage. Likewise, in Python 3 the expression sum(range(37)) evaluates the sum of a generator expression, which also takes a fixed amount of storage. But that's not the case with our dc program, which has to build up the stack before evaluating it. The dc program is strictly worse, as it takes $O(n)$ running time and $O(n)$ memory.

Exercise for the reader: Rewrite the program to use a fixed amount of stack space. There are a couple of ways to do this, but unfortunately at the cost of some human complexity.