Cache strategies in Redis

Redis is an in-memory NoSQL data store. Memory is faster than disk, but there are still ways to improve performance. In this article, Camilo Reyes compares caching strategies.

Redis is a cache database that stores documents in memory. The data store has a key-value pair lookup with O(1) time complexity. This makes the cache fast and convenient because it does not have to deal with complex execution plans to get data. The cache service can be trusted to find a cache entry with a value in almost no time.

When datasets in cache begin to grow, it can be surprising to realize that any latency is not a Redis issue. In this take, I will show you several strategies to cache Redis data structures then show what you can do to pick the best approach.

The sample code can be found on GitHub, and it is written in C# via .NET 5. If you already have LINQPad up and running, feel free to copy-paste code and follow along.

There are a few dependencies. You will need a Redis Windows service when running this on Windows 10. I recommend grabbing binaries from GitHub and setting up the server on your local machine. You can quickly have a server running by executing redis-server.exe. I will be hammering the service with lots of cache data, so I recommend setting the maxmemory to something high like 50MB. A full Redis installation on Windows should take no more than 6MB because it is lightweight. For the C# code, you will need two NuGet packages: protobuf-net and StackExchange.Redis, as well as ProtoBuf and StackExchange.Redis using statements.

These are the caching strategies:

  • Binary
  • XML
  • JSON
  • ProtoBuf

.NET serializers allow you to work with document data cached in Redis. On the server-side, all Redis does is store and retrieve arbitrary blob data. The key-value pair can be as big as it needs to be, assuming it can fit in-memory. It is up to the C# code to decide what to do with this data stream and determine which serialization strategy to use. Any performance issues you may stumble upon have to do with how long this serialization process takes. It is the client code, not the cache server, that does the actual work.

Connecting to Redis

First, put this skeleton app in place. It will need to instantiate a Redis connection, grab a Stopwatch, and declare DTOs.

The key code to look at here is the DTO records. I have declared a separate DTO per strategy. To serialize data in binary, it needs the Serializable attribute. For XML, it needs a default public constructor and property attributes. This is how the XML document takes shape; for example, the id property will go on the parent node as an attribute, and the rest of the XML properties will be declared as children. The JSON data object does not have any ceremonial code. ProtoBuf needs positional ordinals, which are set via an integer. It does not matter what the order is as long as the proto members are unique. Setting the proto contract to skip the constructor makes this serializer work with records.

There are no real performance benefits to using records here other than to keep the C# code nice and terse. These DTOs will be instantiated as lists and will live in the heap anyway.

I declared a constant N to set the size of the dataset that is going into Redis. This is set to fifty thousand records. Depending on the size of your specific cache, I recommend changing this value to fit your needs. ProtoBuf, for example, is not the best strategy for small payloads. This is because there are no one size fits all solutions.

One caveat for XML, the shape of element attributes will dictate the size of the payload. XML tends to be verbose, and setting more properties as attributes on the parent node will reduce its overall size. I opted to include both techniques mostly to show there are ways to change the payload without changing strategies altogether. Feel free to play around with the XML serializer to double-check how this impacts performance.

Binary serializer

To serialize data in binary, instantiate a list of records and set/get the cache entry in Redis.

The StringSet sets data in Redis, and StringGet gets the data. These methods are a bit of a misnomer because the cache entry stored in Redis isn’t in string format but binary. It is not the same binary representation used by the serializer but one internal to Redis when it gets or sets its data. When setting a cache entry, be sure to specify a key. Retrieving the cache entry is as easy as getting the data stream via the same cache key. The TimeSpan argument allows the cache entry to expire after a certain time limit.

With your Redis server running , first, run the project. Then, run the redis-cli.exe executable in a command line and type in GET "binary-cache-key".

This is what you might see:

Text, letter

Description automatically generated

The CLI tool shows a string representation of the underlying blob data. An important takeaway is that I can see bits and pieces of the underlying record like “PropertyA49999”. This took 3.74 seconds to deserialize this dataset which gives you a good indication of the performance. The C# client is much faster than this, but the sheer size of this payload can impact overall performance.

If you are on .NET 5, you may see a helpful build warning when working with the binary serializer. This warning points out the fact that the BinaryFormatter is obsolete. For security reasons, it is recommended to move away from this regardless of performance. I put this serializer here mainly to show how it stacks up against other alternatives.

XML serializer

Time to put the XML serializer to the test.

Note that each strategy uses a separate cache key to be found in Redis via the GET command. The XmlSerializer requires a type, and GetType works well using the DTO list.

Run the project, then go to redis-cli.exe in the command line. Enter GET "xml-cache-key". Looking at what’s in Redis reveals this:

Text

Description automatically generated

XML tends to be more verbose, but the deserialization time is roughly about the same as binary. The XmlDto parent has the id attribute declared in the record via a property attribute. When working with XML cache entries, always keep in mind the size of the payload. This serialization format does allow for more than one way to represent the data which affects its size.

JSON serializer

To implement JSON serialization.

This caching strategy comes with less code because it does not need a MemoryStream instance. I used the recommended .NET 5 serializer that comes built in, which is found in the System.Text.Json namespace.

This time, you’ll use GET "json-cache-key". This is what Redis reveals:

Text

Description automatically generated

As shown, a JSON blob is what gets stored in Redis. Because the payload is relatively smaller, this now takes less than 3 seconds. Note that the payload size is what changed, and Redis still stores an arbitrary blob of binary data.

ProtoBuf serializer

To use the ProtoBuf serializer in Redis.

The ToArray converts the data stream into a byte array before it gets set in Redis. The deserializer looks suspiciously close to the JSON implementation because it is also strongly typed. This ProtoBuf serializer requires a MemoryStream which matches the code found in the XML and binary serializers.

Run the command GET "proto-cache-key". This is what Redis shows:

Text

Description automatically generated

The CLI tool is a bit faster this time than JSON. Because this Redis client is only one piece of the puzzle, I will now turn towards the C# code to tell me the rest of the story.

Performance results

With the Stopwatch put in place, it is possible to gather benchmarks on how long each strategy takes:

  • Binary: read 498ms, write 410ms
  • XML: read 311ms, write 792ms
  • JSON: read 174ms, write 446ms
  • ProtoBuf: read 88ms, write 373ms

ProtoBuf is the clear winner with large datasets, with JSON lagging in second by a factor of two to get cache data. The binary serializer is dead last, and there are reasons to avoid this, given all the security issues. Because XML is more verbose than JSON, performance gets dinged by almost a factor of two for a read. XML is also almost four times slower than ProtoBuf.

These results generally correlate with the payload size for each caching strategy:

  • Binary: 4.9MB
  • XML: 9.8MB
  • JSON: 6.6MB
  • ProtoBuf: 3.5MB

The Redis CLI tool has a flag, redis-cli.exe --bigkeys, to check for cache entry sizes. Interestingly, the binary serializer is the slowest even though the payload is smaller than JSON. I suspect the implementation hasn’t been touched in .NET 5 since it’s deprecated, so this lacks any performance enhancements. This shows, however, that it is the serializer in the client code that dictates how long caching takes.

Now, it’s time for some fun. Change the N constant to something much smaller, say fifty records. The goal is to check how each strategy performs with tiny datasets.

This is what I see on my machine:

  • Binary: read 6ms, write 16ms
  • XML: read 13ms, write 65ms
  • JSON: read 8ms, write 47ms
  • ProtoBuf: read 15ms, write 174ms

As shown, just because ProtoBuf performs well for large datasets does not mean it works well for small ones. This is important to keep in mind as you develop a solution that demands high performance. For small datasets, JSON is preferred. Binary serialization is not that much faster than JSON, and there are reasons to avoid this.

Cache strategies in Redis

Like most solutions in tech, coming up with the best cache strategy really comes down to your workload. If Redis performance starts to lag, the best place to start looking is in the serialization strategy.

In some cases, the DTO can work well with the JSON serializer, but as datasets continue to grow, it makes more sense to migrate over to ProtoBuf. Anything is possible, say the business is doing a great job acquiring more customers and that initial cache strategy no longer performs well.

One technique is to direct the cache serializer based on an attribute found on the DTO type:

This allows the solution to naturally evolve as DTO datasets grow and mitigates the risk of bigger and riskier changes that impact the entire codebase.