Performance Case Study: Caching Cryptographically Signed-URLs In Redis In Lucee 5.2.9.40

By Ben Nadel on August 20, 2019

A month ago, I shared the results of my attempt to improve request performance with parallel data-access in Lucee 5.2.9.40. The main take-away from that experiment was that parallelization is not a silver bullet for data-access. Especially when the cost of spawning asynchronous threads in Lucee is non-zero. That said, one critical piece of insight that I did get from the experiment was the fact that generating Signed-URLs (Amazon S3 pre-signed URLs) was surprisingly expensive. So much so that it accounted for about half of the request latency. Upon seeing this, I decided to perform a follow-up experiment where I cache pre-signed URLs in Redis rather than having to re-generate them on every request. This ended up having a positive effect on performance, despite the additional HTTP request to Redis.

Generating pre-signed URLs requires cryptographic hashing. In the case of Amazon S3, a signature is generated using an Hmac (Hashed Message Authentication Code) Sha1 algorithm based on your Secret Key. I assume that this is the part of the workflow that is creating the request latency. And, by putting the generated URLs in Redis, we can amortize the cost of this cryptographic hashing over a large number of HTTP requests.

ASIDE: Part of generating secure signatures is ensure that both the producer and the consumer / verifier of a signature use the same set of inputs. I recently looked at Linked / Ordered Structs in Lucee 5.3.2.77 as one possible way to make signature generation consistent.

Redis, which I love, is a super fast key-value store. For this experiment, the "key" is going to be a representation of the pre-signed URL inputs; and, the "value" is going to be the generated pre-signed URL:

Redis[ URL inputs ] => pre-signed URL

Redis is fast. But, getting to Redis is still a network hop. And, this HTTP hop is probably the most expensive part of a Redis interaction. As such, I wanted to be smart about how I read-from and wrote-to the Redis cache. Instead of going to Redis for every single pre-signed URL, I wanted to gather all necessary URLs in one multi-read get; and then, write all missing URLs to Redis with one asynchronous, transactional multi-write.

So, for every request to the Lucee ColdFusion application (in this experiment), I am making at-most two calls to Redis: One call to gather the cached URLs; and then, another asynchronous call to Redis if - and only if - there are missing URLs that need to be cached.

Because of the "bulk read/write" model of this experiment, I couldn't just drop-in a replacement for the existing pre-signed URL generation. Instead, I had to re-work my ColdFusion code to defer generating pre-signed URLs until I had all of the inputs that I needed. Then, once I gathered the pre-signed URLs, I had to merge them back into the view-model that I was returning in my API response.

This ended-up being a non-trivial process. What I came up with was a set of "configuration" objects that I would pass to my caching URL generator. Each configuration object could contain an arbitrary metadata Struct that I would use to associate the resultant pre-signed URL with the view-model that it pertained to.

So, in the case of "Screens", I had to generate a configuration object for both the full-size image URL and the thumbnail image URL (partial snippet):

// Check to see if this user is in the experimental cohort (feature flag).
if ( isUsingCachedUrlExperiment( userID ) ) {

	var configs = [];

	// For each screen, add a configuration object for both the full-size and
	// the thumbnail-size URL generation.
	// --
	// NOTE: For each configuration, I am using the "metadata" Struct to
	// associate the pre-signed URL with the view-model (screen and property)
	// that I need to zip-together once the URLs have been generated.
	for ( var screen in screens ) {

		// Configuration for full-size image pre-signed URL.
		configs.append({
			type: "screenFile",
			inputs: {
				serverFilename: screen.serverFilename,
				imageVersion: screen.imageVersion
			},
			metadata: {
				context: screen,
				property: "imageUrl"
			}
		});

		// Configuration for thumbnail-size image pre-signed URL.
		configs.append({
			type: "screenThumbnail",
			inputs: {
				serverFilename: screen.serverFilename,
				imageVersion: screen.imageVersion,
				mobileDeviceID: mobileDeviceID
			},
			metadata: {
				context: screen,
				property: "thumbnailUrl"
			}
		});

	}

	// ------
	// ------
	// In this experiment, I am only dealing with screen URLs; but, if there
	// were other assets in this experiment, their inputs would need to gathered
	// prior to the following call.
	// ------
	// ------

	// Gather the URLs for the given configuration objects.
	// --
	// NOTE: This service will both read-from and write-to the Redis cache as
	// part of this request. That happens transparently.
	var results = cachingCdnService.getUrls( configs );

	for ( var config in results ) {

		// From the results, we can extract the metadata that we provided to
		// the URL-generation service. This makes it easy to associate the
		// "output" (the pre-signed URL) with the view-model input.
		var context = config.metadata.context;
		var property = config.metadata.property;

		context[ property ] = config.output;

	}

}

As you can see, I am gathering all of the necessary URL inputs prior to calling my CachingCDNService.cfc; then, once I make my call to the CachingCDNService.cfc, I use the metadata references to quickly merge the generated pre-signed URLs back into the view-model. It requires a re-working of my existing control-flow; but, it wasn't too bad (especially for a small proof-of-concept).

Now, let's take a look at the CachingCDNService.cfc. This ColdFusion component is a proxy for my other component, CDNService.cfc, which is where the pre-signed URLs are actually generated using the Amazon S3 Java SDK. So, you can think of the CachingCDNService.cfc like a pull-through cache for my URLs that encapsulates the Redis interaction.

This caching service only provides one public method, getUrls(). This method performs the cache-get based on the provided configuration objects; and then - if necessary - spawns an asynchronous thread to perform the cache-put with any cache-miss URLs. The getUrls() function applies an output property to each configuration object that it receives - the output property contains the pre-signed URL.

NOTE: The cache-put is performed asynchronously because the parent request doesn't depend on it; and therefore it doesn't need to contribute to the overall HTTP request latency.

component
	accessors = true
	output = false
	hint = "I wrap the CDN Service and cache URL generation results."
	{

	// Define properties for dependency-injection (for Framework One).
	property cdnService;
	property jedisPool;

	// ---
	// PUBLIC METHODS.
	// ---

	/**
	* I take the given list of URL configuration objects and injects an "output" property
	* into each item that contains the CDN URL. This is performed as an in-place
	* operation which means that the calling context can provide additional keys that may
	* help map the resultant output to the original input.
	* 
	* Each config MUST HAVE:
	* 
	* - type
	* - inputs
	* 
	* These values will map onto the underlying CdnService calls. Any additional keys
	* will be left in place for the calling context.
	* 
	* @configs I am the collection of configs.
	* @output false
	*/
	public array function getUrls( required array configs ) {

		var cachedUrls = getFromCache( configs );
		var missingConfigs = [];

		configs.each(
			( config, i ) => {

				if ( cachedUrls.isDefined( i ) ) {

					config.output = cachedUrls[ i ];

				} else {

					config.output = getSignedUrl( config );
					// This configuration was not cached in Redis - keep track of it
					// so that we can update the cache asynchronously afterwards.
					missingConfigs.append( config );

				}

			}
		);

		if ( missingConfigs.len() ) {

			putToCacheAsync( missingConfigs );

		}

		return( configs );

	}

	// ---
	// PRIVATE METHODS.
	// ---

	/**
	* I return the Redis cache-key for the given config.
	* 
	* @config I am the config being referenced in Redis.
	* @output false
	*/
	private string function getCacheKey( required struct config ) {

		var prefix = "cachingCdn:#config.type#";
		var inputs = config.inputs;

		switch( config.type ) {
			case "screenFile":
				return( "#prefix#:/#inputs.serverFilename#/#inputs.imageVersion#" );
			break;
			case "screenThumbnail":
				return( "#prefix#:/#inputs.serverFilename#/#inputs.imageVersion#/#inputs.mobileDeviceID#" );
			break;
			default:
				throw( type = "InvalidSignedUrlType" );
			break;
		}

	}


	/**
	* I return a collection of cached URLs that map to the given collection of configs.
	* If an individual config isn't cached, the resulting collection index is null.
	* 
	* @configs I am the configs for which URLs are being read from the cache.
	* @output false
	*/
	private array function getFromCache( required array configs ) {

		var cacheKeys = configs.map(
			( config ) => {

				return( getCacheKey( config ) );

			}
		);

		var signedUrls = withRedisConnection(
			( redis ) => {

				return( redis.mget( cacheKeys ) );

			}
		);

		return( signedUrls );

	}


	/**
	* I generate a signed CDN URL for the given configuration.
	* 
	* @config I am the configuration input for the Signed URL.
	* @output false
	*/
	private string function getSignedUrl( required struct config ) {

		var inputs = config.inputs;

		switch( config.type ) {
			case "screenFile":
				return( cdnService.getUrlForScreenFile( inputs.serverFilename, inputs.imageVersion ) );
			break;
			case "screenThumbnail":
				return( cdnService.getUrlForScreenThumbnail( inputs.serverFilename, inputs.imageVersion, inputs.mobileDeviceID ) );
			break;
			default:
				throw( type = "InvalidSignedUrlType" );
			break;
		}

	}


	/**
	* I store the signed-URLs contained within the "output" of the given configs.
	* 
	* @configs I am the configurations that contain signed URLs.
	* @output false
	*/
	private void function putToCache( required array configs ) {

		var TTL_IN_SECONDS = ( 60 * 60 ); // One-hour.

		withRedisConnection(
			( redis ) => {

				var multi = redis.multi();

				for ( var config in configs ) {

					multi.setex(
						javaCast( "string", getCacheKey( config ) ),
						javaCast( "int", TTL_IN_SECONDS ),
						javaCast( "string", config.output )
					);

				}

				multi.exec();

			}
		);

	}


	/**
	* I store the signed-URLs contained within the "output" of the given configs using
	* an asynchronous thread.
	* 
	* @configs I am the configurations that contain signed URLs.
	* @output false
	*/
	private void function putToCacheAsync( required array configs ) {

		// NOTE: I am using CFThread here instead of runAsync() because my
		// version of Lucee doesn't have runAsync() yet.
		thread configs = configs {

			putToCache( configs );

		}

	}


	/**
	* I obtain a connection from the Redis resource pool and pass it to the given
	* callback. Any value returned from the callback is passed-through.
	* 
	* @callback I am the Function to invoke with the Redis resource.
	* @output false
	*/
	private any function withRedisConnection( required any callback ) {

		try {

			var resource = jedisPool.getResource();

			return( callback( resource ) );

		} catch ( any error ) {

			if (
				local.keyExists( "resource" ) &&
				( error.type == "redis.clients.jedis.exceptions.JedisConnectionException" )
				) {

				jedisPool.returnBrokenResource( resource );

				// Delete the reference so it won't get used in the Finally block.
				local.delete( "resource" );

			}

			// Now that we've cleaned-up the connection pool, propagate the error.
			rethrow; 

		} finally {

			if ( local.keyExists( "resource" ) ) {

				jedisPool.returnResource( resource );

			}

		}

	}

}

Notice that the getFromCache() method uses the underlying mget() method on the Jedis instance. This gathers all cached URLs at once (for the given set of inputs). Then, if we have any cache-misses, the putToCache() method uses the underlying .multi() / .exec() methods on the Jedis instance. This writes all missing URLs to Redis in a single, transactional write.

And now, the moment of truth - rolling out the experiment using my LaunchDarkly feature flag service. The following graph represents the amount of time (in milliseconds) that it takes to gather the pre-signed URLs:

Latency graph of URL-generation with Redis caching enabled in Lucee 5.2.9.40.

As you can see, when the feature-flag is enabled, the 95-percentile of the time it takes to gather the pre-signed URLs drops to one-third of what it was taking previously. Furthermore, the graph becomes much more stable (ie, less prone to spikes).

What we can see here is that the cost of making the HTTP call to Redis is significantly smaller than the cost of generating the pre-signed URLs. This makes using the Redis cache a net-positive in terms of latency, especially amortized over a large number of application requests.

Of course, Redis, itself, isn't "free". For this experiment, caching the pre-signed URLs takes about 2GB of memory. And, that's just for this experiment and just for one type of asset (screens). If I were to retrofit the rest of the application with this type of caching, the memory load would be significantly larger (and, of course, the effort of retrofitting would be non-trivial).

Everything is a trade-off.

Performance is a marquee feature of any application. And, just like any other application feature, it has to be improved over time. In my previous exploration, I was hoping that asynchronous data-access would be a silver-bullet in Lucee CFML. It was not. By adding StatsD metrics and measuring actions, though, I was able to see that generating pre-signed URLs was a bottleneck. And, as it turns out, caching those pre-signed URLs in Redis can have a significant impact on performance in Lucee 5.2.9.40.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/go/3685

Reader Comments

Ben Nadel Aug 21, 2019 at 8:50 AM

15,688 Comments

@All,

As a follow-up to this post, I wanted to quickly look at how I generate cache-friendly, CDN-friendly signed-URLs in Lucee:

www.bennadel.com/blog/3686-calculating-a-consistent-cache-friendly-expiration-date-for-signed-urls-in-lucee-5-3-2-77.htm

The basic approach is to bucket expiration dates based on a rolling window. This way, all URLs generated for the same resource will result in the same signed-URL for some period of time (which allows them to be cached).

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.