You're reading the Speedshop newsletter on Ruby and Rails performance by me, Nate Berkopec.

Last week, I walked through Josh Pigford's new Rails app and did a live perf teardown. We talked mostly about caching, manipulating large collections in memory, and Sidekiq concurrency settings. Check out the recording here.

I also recorded a video on how to help maintain any open-source project, even if you no experience or maintainer permissions.

Is caching a necessary step when scaling a web application?

I've talked before about the two reasons to work on performance - to improve the customer experience, and to scale easier and for less money.

Where does caching fit into those two reasons? What problem is caching best at solving? When I'm talking about caching here, I mean caching with an external server, like Redis or Memcached, not memoization with instance variables (@my_var ||=).

I don't think it is. Caching is far better at making particular pages faster than it is at solving your scale problems.

I thought about this recently when I read this article, which trended very highly on Hacker News, about scaling. I had a few problems with the approach outlined there, but my largest problem was that the author portrays caching as a necessary step in the scaling of a web service that makes scaling easier.

Caching can make a transaction (that is, a complete web response or a background job execution) faster by pre-computing the result and serving that result, rather than computing the same result many times. It does this by adding a new component to your infrastructure (this is important!).

From the article:

"The cache comes in handy when the service is making lots of repeated calls to the database for the same information. Essentially we hit the database once, save the information in the cache, and never have to touch the database again."

"Making lots of repeated calls to the database for the same information" - wait just a god damned minute. Isn't that what the database is for? Getting information from it, repeatedly?

Caching is not magical speed or scale juice, nor is it always appropriate.

Adding new components to our infrastructure always imposes a cost on our ability to scale. It literally makes our application more expensive, and it also creates a new service that we must maintain (change plan sizes, connection pools, etc). It adds another spot where things can go wrong. It will always be simpler (and more scalable) to not cache.

Unnecessary or poorly implemented cache schemes impose huge scaling costs on teams. They get buried in dealing with running out of connections, multi-threading issues where cache database connections are managed poorly, and scaling their new database. And in the meantime, their caching strategy may not even be necessary or very effective.

So, we use caches to make particular parts of our applications faster, not as a blanket strategy to improve our requests-per-second capacity.

How can we cache in a smarter way?

First, don't. The most common scenario I see in poor cache usage is when teams try to cache the results of a complex or slow SQL query/ActiveRecord call. Their APM points them to this slow call, and someone says "cache it!", because no one on the team knows how to optimize SQL. First, if you can fix the SQL, you should do that. It will always be faster in the long run to fix the underlying problem rather than not fix it and cache the result. Second, no one in this scenario asks "what will the hitrate of this cache location be?"

An external cache's hitrate is its most important performance metric. If you have a hitrate of 90%, 90% of cache accesses result in a "hot" cache and the pre-calculated result is served. 10% of the cache accesses are "cold", which means there was no pre-calculated result, and you have to calculate it and store it.

So, the average amount of time it takes to calculate a cached fragment will be:

Cache hit rate * network round-trip-time to cache + Cache miss rate * (network round-trip to cache * 2 + avg time to render this section)

Note how in the miss case we impose two round-trips to the cache - one to check for the fragment and miss, and then one to store the newly calculated result.

Using this equation, we can come up with many scenarios in which caching actually makes a piece of code or fragment slower than it was without caching.

Consider a slow ActiveRecord query that takes 500ms. You wrap a cache block around it. The hitrate in production is 50%, and the round-trip time is 10 milliseconds. These are extremely common numbers, unfortunately, as people often do not consider how often a piece of code will actually be able to re-use the cached result.

Without the cache, the time to render this section will be 500ms. With a cache involved, it will be:

0.5 * 500 + 0.5 * (20 + 500) = 510ms. You actually made it slower by caching it!

If you think about it, cache hitrates of less than 50% will generally always make code slower, and, depending on round-trip-time, can make even higher hitrates a net loss as well. Code fragments whose execution speed approaches the cache round-trip (10 milliseconds in this case) will also get slower.

In general, overall cache hitrates of less than 90% make me nervous because I can be sure there's a few keys whose hitrates are very low and are bringing down the average by a lot. Most recommendations say aiming for 95% or more is ideal. Certainly, higher hitrates make the operations cost of adding a new cache much more worth it.

Finally, let's talk about "the database is slow but cache is fast".

First of all, many organizations are running a significant portion of their database in memory these days. It's not like we're getting data off spinning rust anymore, either. So, both the database and the cache are reading data from memory most of the time - so what's the difference?

The difference between our SQL databases and our key-value external caches is the access pattern. SQL databases use, well, SQL, and caches are essentially giant hashes. One access pattern (SQL) is extremely complex, and one is not.

People start caching SQL results because they don't know how to optimize SQL.

SQL operations can often be just as fast as cache operations if you just learn enough about SQL to optimize it. This isn't something I cover in my materials (I'm not a DBA), but the resources are out there. Don't use caches because you don't know how to optimize a particular SQL query.

If you can get speed without adding something to your infrastructure, you should do that instead.

Anyway, for more about caching in Rails, you can see my old blog article about it.

Until next time,

-Nate

You can share this email with this permalink: https://mailchi.mp/railsspeed/do-you-have-to-add-caching-to-scale-well?e=[UNIQID]

Copyright © 2020 Nate Berkopec, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.