Use WARMED to evaluate software engineering practices

This teapot is very warmed!

In a lecture titled Where Does Bad Code Come From?, Casey Muratori introduced the acronym WARMED for categorizing the costs of code:

Write
Agree
Read
Modify
Execute
Debug

These categories broadly group the activities in the lifetime of code. Agree represents the time that your team spends reviewing code and debating coding practices. (Casey joked it might more accurately be labelled Argue.)

You can evaluate any software engineering practice in terms of the costs and benefits in these categories. This allows you to clarify tradeoffs, and guide you towards techniques for building better programs.

WARMED isn’t ordered by importance, it’s only set up to spell a memorable word. It depends on the constraints of your project as to which costs should be minimized. But there are some generalities.

I agree with Casey that Execute is usually the most important. Users are concerned with execution time (and correctness), and they tend to vastly outnumber software engineers.

Also, code is Read, Debugged, and Modified much more than it is Written. The cost of writing code is normally the least important to optimize.

You can use WARMED to guide larger decisions, like “should we use optional type checking?”, as well as smaller ones, like “is this library worth it?”. For the rest of this post, let’s evaluate a couple of common practices.

Linting

Linters catch mistakes early, saving time on code review and debugging, at the cost of some extra writing. But you do need to pick lint rules that find real issues, rather than rules that highlight trivial or automatically-fixable issues.

Breaking this down:

Write
Linters impose a small-to-medium cost when writing code. If you’re new to a given language or project, you can spend a lot of time understanding and fixing lint errors. But this cost decreases as you become familiar with the “right ways” of doing things, tending towards zero.
Agree

Linters should reduce the cost of reviewing code. When an automated check can highlight small issues, reviewers don’t need to do that, and can instead focus on higher-level concerns.

Lint rules also act as a form of knowledge sharing. Rather than having to tell other developers to “always do X this way”, you can encode it as a lint rule.

Read
Linted code is easier to read, because it’s more standardized. Also, you aren’t so likely to be distracted by small potential issues.
Modify
Linted code is generally no more costly to modify. Adding new lint rules costs time though, to update existing code to be compliant.
Execute
Most lint rules don’t affect execution time. But some do guard against common performance anti-patterns, which can make your code faster. Such increases tend to be small, as linters can only spot localized micro-optimizations—they can’t fix a bad design.
Debug
Linting removes common issues from code, reducing the amount of time you need to spend debugging. Some lint rules also aim to make code easier to debug.

For most projects, using a (high-quality) linter is a no-brainer.

Garbage collection

Managing memory can be tiresome and hard to get right, so many languages do it for you, like Python, JavaScript, and Golang. These languages use garbage collection (GC) to find and free no-longer-used memory.

GC makes writing, reading, and modifying the code cheaper. But this comes at the cost of increased execution time.

Breaking it down:

Write

Code using GC is certainly easier to write, since there is less to think about, and less code to write.

Indeed, most programmers today learn using a GC-based language, and may not ever learn direct memory management. Of the most commonly used languages in the Stack Overflow 2021 survey, it’s only #10, C++, that doesn’t have built-in GC.

Agree
GC should reduce code review time, since there is less memory management to consider.
Read
Since GC leads to shorter programs, they should be easier to read.
Modify
As per “Write”, GC should reduce modification costs. GC guards against introducing certain classes of memory management bugs, like use-after-free, which often occur when modifying code that manages its own memory.
Execute

GC adds a significant runtime cost, although exactly how much seems context-dependent. A recent paper studying GC found it increased runtime 7-82%. Instagram saw a 10% speed boost by simply disabling Python’s garbage collection.

GC also requires more memory, both for its own bookkeeping data, and for “garbage” data being held longer than necessary.

Depending on the garbage collector, it may impose “stop the world” stalls at runtime. This is where the collector pauses the whole program for a significant amount of time, seconds or tens of seconds, whilst it finds garbage. Such stalls are completely unacceptable in certain applications—imagine if an aeroplane became unresponsive whilst the control software ran GC.

Debug
Garbage collection shoudl reduce the amount of debugging, since it prevents several kinds of memory mismanagement. But it can also hamper attempts to dig into memory and performance issues, since some details are hidden. Since collection can trigger unpredictably, it might lead to random-looking heisenbugs.

So, GC is double-edged sword. It boosts development speed, whilst imposing notable runtime costs. But if paying those costs are what allows the code to exist in the first place, it may be worth it.

Fin

May this simple framework guide you towards better practices,

—Adam


Subscribe via RSS, Twitter, Mastodon, or email:

One summary email a week, no spam, I pinky promise.

Related posts:

Tags: