You are reading the Speedshop newsletter on Ruby on Rails performance, by me, Nate Berkopec.

How does performance (and other software-quality work) change at startups?

Last week, on the Complete Guide to Rails Performance chat, member Haroon asked:

"How do you manage the expectation of a code base/pull request when faced with the scenario of speed vs quality -- especially small startups where you need to move fast."

It's a classic engineering tradeoff. Anyone who tells you there's a simple answer is not thinking very hard.

Startup engineering is a special case of this tradeoff. This is because startups are "default-dead": if they did nothing or only maintained their current state, they would run out of money and die. The startup's trajectory is death.

This means that any engineering work which does not directly and immediately add value to the company potentially endangers it. If your company is currently dying, doing things which don't have a chance of fixing that is a bad use of your time.

In aviation, pilots have a ladder of priorities: first, keep your speed high enough that you don't fall out of the sky. Second, point the airplane at the horizon, not at the ground or another airplane. If you do not do either of these two things, you are "default-dead". So, these two things have to be done before you can do anything else in the cockpit.

At a startup, your version of getting the wings level and speed up is finding a repeatable and profitable business model. This means shipping features. Until then, the business is stalled and pointing straight at the ground.

Your job becomes to ship as many features as possible until the startup finds something that sticks. Your job is to help get the startup quickly through the Lean Startup discovery loop.

These are the decisions engineering makes when evaluating a speed versus quality tradeoff:

How many tests should we write? Should we write tests?
Should we enforce style rules?
How should we prioritize bug fixing, performance, security and feature work?
How fancy should this feature be? Can I implement it in a simpler way?
How extensible does this code need to be?
If code works and fulfills the product side, but I know it could be better, should we stop and make it better? How much?
Should we educate our developers and build their skills, or only let them work on features and bugs?

You could let everything go that doesn't help you ship as many new features as possible today. Let's call this the "maximum debt" strategy. This may get the startup through the next few months, but by the time the next capital raise comes, the startup may be so mired in technical debt that progress slows to a crawl. High velocity now, low velocity later.

The other extreme is that code should be as high-quality as possible, all the time. This strategy may not even get the company to the next capital raise, because they weren't able to experiment and try enough new things to find a successful business model. Instead, the startup dies, and its perfect code becomes property of its creditors. Low velocity now, nothing later.

One of these strategies is guaranteed to fail. One might work, for a while. Guess why the one that might work gets picked more often?

Technical debt, like real debt, is a lever. Use it correctly, and you can achieve things that you couldn't without it. Use it incorrectly, and you'll go bankrupt.

One thing that many teams do not do in this early startup stage is to try to measure their technical debt. Perhaps they think that if they don't look, it will go away. But debt doesn't work like that.

This is also true of skill levels. If a team is mostly filled with low-experience developers, they may be making decisions whose future consequences they don't understand. Senior engineers understand when they're incurring debt and when they're not.

Debt is extremely quantifiable. We can measure our performance debt by monitoring average and 95th percentile response times. Are they better or worse than last month? Feature velocity is easy enough to measure if you spend a day or two working out a process for it. CodeClimate is a thing now. Quantifying bug reports and time spent fixing bugs isn't rocket science either.

Let's take out technical debt where appropriate - but know how much we're taking out. Are things getting better or worse? Is feature velocity suffering as a result?

However, most legacy applications are dumpster fires. I'm a consultant. I've seen a few. Does that mean that most people are making the wrong decision on this tradeoff, and are sacrificing too much quality?

Imagine you're an airplane designer in World War II. You see a lot of planes come back from the front lines with a lot of bullet holes in them. You think you should add some armor to the plane to stop the planes from getting shot down.

Do you add the armor where you see the bullet holes? Makes sense, right? Add more armor where they get shot the most often.

But you're not seeing the planes that didn't make it back.

Bugs and performance typically don't kill startups. Startups die by not making something people want. Startups that prioritize feature velocity over quality tend to survive.

This is deeply offensive to many software developers.

Many software developers have internalized a self-image of a "software craftsman". Like a seasoned artisan, these developers believe their job is to create perfect artifacts, lacking in any flaws. They derive their self-value from their perception of the quality of their work. At startup, this is not what they're getting paid for, though.

If you're a startup engineer and find the "software craftsman" identity appealing, I encourage you to get away from your desk a bit more. Talk to the sales people and listen to what their problems are. Ask the CEO what they think the most important things are for the business for the next six months. Do these things align with software perfection? Or even software of moderate-high quality?

This all changes drastically when you're no longer default-dead. If the business has a stable, profitable business model, you can and should plan for the future. Features are going to stick around for at least a few years, probably more like 5 to 10. The difference between "default-dead" and "default-alive" is huge.

At big companies, the tradeoff starts to swing towards quality. Really big companies are actually more concerned about maintaining the revenue streams they already have and protecting what already exists. In that kind of engineering situation, software quality becomes much more important.

Know what the ultimate engineering objective is at your organization: survival, steady growth, or maintenance, and you'll know how to make the hundreds of daily quality/speed tradeoffs you're asked to make.

Until next week,
-Nate

You can share this email with this permalink: https://mailchi.mp/railsspeed/on-quality-and-performance-at-early-stage-startups?e=[UNIQID]

Copyright © 2020 Nate Berkopec, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.