CloudFront function to remove part of the request path

How to use arbitrary paths in origin requests

Author's image
Tamás Sallai
3 mins
Photo by Lukas: https://www.pexels.com/photo/photo-of-person-slicing-lemon-952368/

Origin paths

CloudFront routing is based on path patterns. There are ordered cache behaviors that define a pattern that CloudFront will try to match with the incoming requests' path in the order they are defined, and there is a default cache behavior that does not have a pattern but catches everything. Since cache behaviors define which origin to forward the request to, this setup allows different backends to live under the same domain.

A typical example is to host an API along with a web application in the same domain. This works by specifying a path pattern for the API, such as /api/* while the webapp's files are handled in the default cache behavior, usually with an S3 bucket origin. This setup returns a static file for the /index.html path, but serves a dynamic response for /api/users.

The problem in this setup is that while the path pattern is used to determine which cache behavior to use, the request path CloudFront sends to the origin is also affected as the whole request path will be sent. This means all requests going to the API will start with /api/ as those are the only paths matched with that cache behavior.

In contrast, a 2-domain setup where the API has its own dedicated domain does not suffer from this problem. Requests to the API can have any path as the domain names already distinguish which origin will handle the request

If the API does not expect requests to start with /api/, as is usually the case when starting with a 2-domains setup, then putting CloudFront to provide a common domain will break the functionality.

CloudFront function

Fortunately, CloudFront supports running arbitrary code for each viewer request and this code can also modify the path sent to the origin. While its execution environment is severely limited with a lot of restrictions, it's perfectly capable of changing a string.

The code that removes the first part of the path:

function handler(event) {
	var request = event.request;
	request.uri = request.uri.replace(/^\/[^/]*\//, "/");
	return request;
}

When it runs, it reads the event.request.uri and removes the first part between /s, turning /api/users into just /users.

The resource in Terraform:

resource "aws_cloudfront_function" "remove_part" {
	name    = "remove_part-${random_id.id.hex}"
	runtime = "cloudfront-js-1.0"
	code    = <<EOF
function handler(event) {
	var request = event.request;
	request.uri = request.uri.replace(/^\/[^/]*\//, "/");
	return request;
}
EOF
}

Then configure it for a cache behavior:

ordered_cache_behavior {
	path_pattern     = "/api/*"
	# ...
	function_association {
		event_type   = "viewer-request"
		function_arn = aws_cloudfront_function.remove_part.arn
	}

	viewer_protocol_policy = "https-only"
}

This works because by the time CloudFront runs the code the cache behavior is already selected. So it does not make a problem that the resulting path would match to a different behavior.

September 5, 2023
In this article