You're reading the Rails Performance Newsletter by Nate Berkopec of Speedshop.

A while back I hosted my first ever open-source office hours. They went really well!

What does 10% faster really mean?

We need to talk about contextualizing benchmarks.

Benchmark results are often presented publicly in the format of "Y piece of software is now X% faster". This is true, in a sense, but benchmarks, like statistics, require context, otherwise you can make them say whatever you want. There are lies, damned lies, and benchmarks.

I don't think this is the fault of the benchmark author. It usually makes sense to them, because they wrote the software and the benchmark, they have the context. They understand what being Y% faster on a particular benchmark will actually mean for their users. But most users don't have that same context, and instead use headlines like these to make logical leaps. "Y piece of software is X% faster" turns into "if you upgrade to this version or switch to this software, your _application_ will become X% faster".

So, what I really want to talk about is benchmarks for frameworks, application servers, and other libraries whose job it is to _help you_ run your code. Not benchmarks for software that does it's work entirely internally (for example, a JSON parser).

I've talked before about how expressing benchmarks as relative percentages ("10% faster/15% slower") obscures a crucial fact: how much time is 10%? How much faster have you made a single iteration of the benchmark, in terms of milliseconds or seconds?

The reason why we use relative percentages when bragging about our benchmark results is because if we talked about the absolute units of time, everyone would realize we were all usually microbenchmarking. 10% faster sounds much less impressive than "10 nanoseconds faster per request/job/unit-of-work".

And, of course, the other question is "10% faster _doing what_"? Most benchmark results like this that get talked about online are not benchmarking real-world scenarios - they're benchmarking "hello world" workloads or, worse, workloads that are specifically crafted to make themselves look good and make their competitors look worse. "Hello world" benchmarks for frameworks (like Rails, for example) obscure the fact that most of the time in a real-world application is spent in the application code, not in the framework itself.

Let's take Mike Perham's recent announcement for Sidekiq 6.0.1. I'm picking on Mike because I like Mike and I know he won't take my little rant the wrong way, and he didn't do anything wrong, but I do think the announcement was misinterpreted a bit.

The release notes for Sidekiq 6.0.1 said that "should be 10-15% faster now". This was posted on reddit and got a lot of upvotes.

Does this release note mean that when you switch to Sidekiq 6.0.1 that your jobs will be 10-15% faster? Does it mean that when you upgrade to 6.0.1, that your jobs reported latency in Scout or New Relic or whatever will go down 10-15%?

It emphatically _does not_ mean that. In fact, I would guess that this change will make an impact of 1% or less on most applications. How do I know this?

Mike's Sidekiq benchmark enqueues 100,000 jobs and tries to process them as fast as possible. The jobs it enqueues are basically no-ops (they log the time). Does this look like a typical Sidekiq job to you?

Most Sidekiq jobs spend 200-300 milliseconds doing stuff in ActiveRecord or calling to external services, like an email service. That won't get any faster when you switch to Sidekiq 6.0.1 - it will take exactly the same amount of time that it did before.

This benchmark measures the amount of framework overhead that Sidekiq adds to every single Sidekiq job you process. From Mike's perspective, this actually makes a lot of sense to measure, and it's good that he measures it! It would be pretty bad if Sidekiq added 10 milliseconds or more of overhead per job, for example. Actually, that's about how much latency it added in Sidekiq 4 - so it's a massive achievement that it's less now, and this benchmark obviously helped Mike to figure that out!

But how much overhead does it add today, in terms of absolute time? We can figure it out from the result posted in Sidekiq's README for 6.0, which Mike helpfully points out for us.

Sidekiq 6.0 adds 3 milliseconds of latency per job. So 10% faster than that (what Sidekiq 6.0.1 did) would be about 2.7 milliseconds.

See how much less interesting of a headline that is? "Sidekiq 6.0.1 is 300 nanoseconds faster per job". But, it my opinion, it's a more informative one, and less likely to cause the reader to draw an incorrect conclusion ("my application will be 10% faster after this upgrade!")

Relative benchmarking still makes a lot of sense from the perspective of a library maintainer. We'll be adding some benchmarks to Puma soon, and I'll be paying attention to the relative numbers as well as the absolute latency we add per-request.

But, next time you see a benchmark of a framework or server in relative terms, I want you to think: "was this a hello world benchmark, and what is the absolute difference in latency that this benchmark reflects?

-Nate

You can share this email with this permalink: https://mailchi.mp/railsspeed/benchmarks-relative-changes-and-games?e=[UNIQID]

Copyright © 2019 Nate Berkopec, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.