Custom Ink's Kubernetes Journey

There's been an elephant in the room during the past few Custom Ink Tech Blog updates. Perhaps it's been alluded to, but we've been consistently putting off addressing it here, simply because of its scale, as well as how many subsequent blog posts could be written wrestling with its implications. (This, of course, means we've also been putting off said subsequent blog posts, because we never got around to writing this one.)

So, it's time to rip off the proverbial band-aid; the majority of Custom Ink's compute workload is now running on Kubernetes, by way of Amazon EKS. (The rest is, as you might have guessed by reading this blog, on AWS Lambda.) We no longer use Chef, and we no longer use Capistrano to deploy our services and applications directly to EC2 instances. We run everything behind Customink.com without keeping track of individual servers and persistent file systems, and all app-specific infrastructure configuration - OS, libraries, packages, resources allocated, ingress - now lives in the same Git repository as a service's source code, where developers can change them as they see fit.

The Journey Begins

Even before Custom Ink moved to the cloud, the pattern of persistent, stateful servers managed by Chef with code deployed by Capistrano served us well. Our infrastructure wasn't exactly ephemeral; it would take some manual action to get a new server up, get a load balancer pointed at it, and tell Chef to tell Capistrano it was ready to deploy to. Servers tended to stick around for a while (so, not ephemeral), and when it was time for an upgrade, we'd upgrade them (so, not immutable either). That said, that pattern stayed with us in the cloud, followed us to a resilient multi-AZ layout, and absolutely beats doing everything by hand.

In this engineer's opinion, a cultural shift within development is what motivated us to begin the journey to ephemeral and immutable infrastructure. As you might have gleaned from some past blog posts, Tech Inkers know a thing or two about Heroku. The venerable PaaS taught a generation of developers that, if they could generate their runtime environment and compute resources from code instead of asking a sysadmin for it, they could deliver more, and do it faster. Over the course of time, everyone arrived at the same conclusion; we can do this faster, and with less work.

It's important to note the moment in which this sea change took place. These conversations were happening at Custom Ink in late 2019. (Yes, this blog post has been marinating for quite some time.) This mean that the immutable, ephemeral platform we were already dipping our toes into - AWS Lambda - was still a little rough around the edges. We were still getting a hang of how to use lambda layers to bring in binaries we needed for some of our services (say, MySQL and Oracle connectors) into the execution environment. Secure secret injection out of the box was not really a thing yet.. Finally, keep in mind that Lambda would not support OCI images for another year. It was (and still is) fertile ground for new development, but we had a backlog of existing services that needed a new home.

OCI images (Docker, for the layperson) and a container orchestration platform seemed to be our quickest way to lift and shift our services to an ephemeral, immutable infrastructure. Thankfully, our applications were already stateless; environment-specific configuration was handled by dotenv, which was happy to ingest values from environment variables, and everything else lived in either databases, memory caches, or S3 buckets. There were few to no persistent files to worry about. Thanks, Rails!

Even in 2019, there were some options for running containers in AWS. Kubernetes won because it had mindshare on the team, and because it was the most flexible. (This last part, as we will soon learn, can be a blessing or a curse.)

If you're curious, our first customer-facing production service hosted on Kubernetes was the international checkout component of our website, which launched in March of 2020. The web frontend of customink.com was on Kubernetes in June of 2022, and the last to make the leap was the service that handles clipart in the design lab, which moved in January of 2023.

EKS Essentials

A familiar feeling to anyone who has spun up an EKS cluster: looking at the console, maybe creating a "hello world" pod and seeing it schedule somewhere, and thinking "OK, what now?".

Although Fargate, EKS managed node groups, and Karpenter have since come along and made things a bit easier, there's a hard line between resources managed in AWS API and resources behind the API of your Kubernetes cluster. If you want to spin up a pod, you need to connect to the control plane of your EKS cluster and ask for it. That means you have to authenticate to it first. If you want something on the internet to be able to connect to that pod, you need to figure out how to get Kubernetes to tell AWS to point an IP address (or two, or three) at a port where your pod is reachable. If you need to get secrets (say, API tokens or passwords) into the environment of your pod, it's on you to decide how to get that out of your secrets manager of choice. EKS is very hands-off in that regard, unlike other, more opinionated Kubernetes distributions. This is how we chose to instrument our Kubernetes clusters; it is not the only way to do this, and there's certainly more CRDs and daemonsets running than those we listed here, but mastering these were critical to our adoption of Kubernetes.

aws-auth

One nice thing EKS comes with out-of-the-box is the AWS IAM Authenticator for Kubernetes. It allows us to map IAM principals to users within Kubernetes. This is handy; it means we don't have to worry about giving everyone a Kubernetes identity, and instead just scope their privileges based on their IAM identity.

By default, the only user who is allowed to talk to the Kubernetes control plane is the IAM principal that created the cluster. Even if that principal was a role assumed by a group of people, that's less than ideal; after all, from Kubernetes' perspective, that means there's only one user, and no way to reduce the scope of that user's access (which is, of course, full admin). Thankfully, the IAM authenticator gives us one more thing; the ability to edit a ConfigMap called aws-auth to associate IAM principals with groups, and groups with Roles or ClusterRoles. This means we can have an IAM role called "ReadOnly" that we map to a Kubernetes role that can only do "get" actions, or an IAM role called "PowerUser" that's allowed to restart deployments - you get the picture. This is how we map external entities, such as engineers, developers, or external software that operates on the Kubernetes API, to Kubernetes RBAC. All the calling entities need to do is run or implement the AWS IAM Authenticator process, which in a nutshell uses IAM credentials to call the AWS EKS API. The caller gets a temporary session token in return, which is then good for talking to the Kubernetes control plane.

Ingress

As we've written before, we make use of path-based routing for some (but not all) of our customer-facing applications. Thankfully, making this change on the Kubernetes side was not a problem. The Kubernetes Ingress API allows for rule-based routing, and the AWS Load Balancer Controller allows us to implement the ingress spec by creating and controlling Application Load Balancers. We use the AWS Load Balancer to deploy a new IngressClass - let's called it the "shared ingress" - which represents a load balancer with no rules, just various ACM certificates and security settings. Each individual application then is responsible for declaring an ingress, specifying what paths and hostnames they listen on, using using the aforementioned shared ingress as the IngressClass. The AWS Load Balancer Controller sees that an Ingress is using an IngressClass it created, and adds the corresponding rules to the "shared" load balancer. This allows us to control load balancers however we want, and even share them, from within the Kubernetes API.

External Secrets

There are a lot of ways and places to manage secrets, but our biggest requirement is for secrets to be encrypted at rest, restricted by IAM policies, and managed somewhere other than the Kubernetes control plane. At this time, we are using SecureString parameters in AWS Systems Manager Parameter Store. The parameters are encrypted with KMS, and IAM limits the principals that can decrypt them.

We elected to use the External Secrets Operator as a mechanism to turn SSM parameters into Kubernetes Secret resources. This tool provides a number of Kubernetes Custom Resource Definitions (that is, new objects for the API to use) which allow us to access external secret stores. The key objects here are SecretStore and ExternalSecret resources. The SecretStore object instructs the ESO to connect to a secrets backend of choice. An ExternalSecret object can then be created, referencing the SecretStore we created, asking for certain values to be pulled from that backend. (In our case, the SecretStore is pointed at AWS SSM.) When an ExternalSecret is created or modified, the ESO will connect to the secrets backend, retrieve the secrets, and create a Kubernetes Secret object populated with the values. We can them use this Secret as we would any other Secret or ConfigMap, mounting it on a pod's filesystem or exposing it as environment variables to our application can use the secrets.

Service Accounts and IAM

As you might have inferred by now, there's lots of pods within our Kubernetes cluster that need IAM permissions; either to perform infrastructure actions to support our services (create ALBs, get secrets, etc) or in the normal course of a service operating (say, we're uploading a file to S3 or sending a message down an SQS queue). While permissions associated with the IAM role of a Kubernetes worker node can be inherited by all the pods on them, we really don't want that; after all, that would mean everything on the cluster would have the same role, with very broad access. We would prefer that applications running in Kubernetes be scoped to their own IAM role. Thanks to the OIDC capabilities of EKS and IAM, this is pretty easy to do.

EKS clusters come with their own unique OIDC URLs. These URLs, along with a unique thumbprint associated with them, allow IAM OIDC providers to verify that a caller from outside of AWS (say, the EKS control plane) is who it claims to be. We can then add the OIDC provider for our new Kubernetes cluster to the trust relationship of an IAM role. (We can scope the trust relationship further, to namespace and service account name, if we really don't want the app that uploads clipart to be able to assume the role that lets it play with load balancers.)

With this in place, all we need to do is create a ServiceAccount object, add the IAM role we want to assume as an annotation, and associate it with our pods. Some more EKS-specific magic then takes place in the background; the AWS EKS Pod Identity Webhook sees that a pod is associated with a ServiceAccount that wants to use an IAM role, and makes an AWS STS call to IAM asking for a temporary web identity token for that role. If OIDC was set up right, it gets a valid web identity token, and mounts it, along with the name of the region it's running in, as environment variables in the pod. At that point, your code, and the AWS SDK (assuming you're on a version new enough to recognize web identity tokens) will run as if it's on an EC2 instance with an IAM role.

This way, we can associate pods with different IAM roles, and prevent services from encroaching on eachother's permissions.

The Deployment Pipeline

Early on, we elected to come up with a solution that was CI tooling-agnostic; whatever we did would not be bound to Jenkins, CircleCI, TravisCI, GitHub Actions, or the crontab running on the gaming PC in the closet. We didn't want to anchor ourselves to any of those things. Instead, we packaged all of the various scripts needed for our deployment process into a Ruby package we refer to internally as "KTool". It is very opinionated shim layer between our code repositories and the tools we use to build and push the Dockerfile, generate the Kubernetes YAML, and get that YAML applied to the Kubernetes cluster. KTool can be pulled into the pipeline, as a gem or a docker image, and be called as an executable by whatever CI tool elects to bring it in. This gives us the added ability to change and add more features to KTool without having to chase around various GitHub Actions, CircleCI orbs, or other tooling-specific components.

The above diagram is a bit of a simplification; depending on the service and the workflow, there's any number of various test/validate/wait steps that can fit in at any time. There's also a separate process for deploying feature branches to atomic dev environments, but that warrants its own blog post.

Dockerfile

The Dockerfile lives in the Git repository, so developers can add dependencies as needed. We do single-stage builds; that is, we build the Dockerfile once, and reuse the image as we promote it from dev to staging and prod. That way, we're certain that the artifacts going to prod are the same as the artifacts that we tested in staging. We Docker images in an Amazon Elastic Container Registry accessible to the dev, staging, and production Kubernetes clusters. As a best practice, we avoid use of the :latest tag, and instead tag our images with the commit hash of the repository, so we know exactly which image corresponds to which PR. Our repositories are set to immutable, so we can't accidentally overwrite a production image with something else.

From the CI pipeline's point of view, all they're doing is running "KTool build".

Kubernetes Manifest

We don't expect developers to write their entire Kubernetes manifests; even if we did, there would have to be some automation to put the Docker image tag (which, as mentioned above, is a commit hash) into the pod specs. Instead, we ask developers to populate a simple YAML file that answers platform-agnostic questions like "does my webapp have an ingress?", "do I have a background worker too?", "how much RAM does it have?", "what command should it run?", and "what secrets should I bring in?". KTool then picks and choses some .yaml.erb templates we maintain and populates them with the right values. It then feeds them into Kustomize to collate them all together, insert the image tag and application name, and make a nice, compliant YAML file ready to go into a Kubernetes cluster. It also creates an ArgoCD application definition; more on that later.

This gives us the added advantage of being able to update KTool with new "sane defaults"; say, one day we decide to turn on read-only filesystems, or mandate that everything have a reverse proxy sidecar to do service mesh-y things. We update KTool, and everyone just gets that in their manifest next time they deploy.

Deployment to Kubernetes

Once we have the manifest, KTool checks it into yet another Git repository, which holds all of our Kubernetes manifests.

It is important to us that we version-control every manifest that goes into Kubernetes, so we know when an update happened, and how to roll back. It also helps identify config drift if someone did something by hand.

Speaking of config drift: we use ArgoCD as our means of actually getting our manifests applied to the cluster. It is pointed at the aforementioned repository full of YAML manifests, and as soon as something changes in the repository, it makes it so in Kubernetes. Not only does this mean that changes are automatically applied, but it reverts config drift, heals resources that were deleted by mistake, and provides a friendly GUI for developers looking to see how their services are behaving.

This way, even if we somehow accidentally lose an entire EKS cluster, we can be confident that everything will come back if we install ArgoCD and point it at the repository.

Impact

The unlocks from our move to Kubernetes are hard to count. Some of these deserve their own blog posts, but here's a few quick benefits in a nutshell.

Developers can update their runtime dependencies. Moving to a new version of Ruby or Python is as easy as updating the Dockerfile. No more need to get a server built or modified.
Developers can install dependencies for their runtime. Need a specific library or binary? Go in the Dockerfile and install it. No more need to ask someone to update a Chef cookbook.
Since pods just grow right back if they're disrupted, and we don't store any state on the hard drive, there's absolutely no reason why we can't use spot instances... so, our entire dev and staging environments are running on spot requests!
If it's time to update an AMI (which, in this case, really means the AWS-managed EKS worker AMIs), we modify our managed worker groups, and the change is rolled out automatically; pods are rescheduled from old nodes to new nodes, and because we use PodDisruptionBudgets in our Deployment resources, it happens without downtime. Again, because there's nothing persistent on the compute instances other than the Docker images themselves, moving the pods around is trivial.
We can scale horizontally at the push of a button (by updating the Deployment spec) or automatically (using Horizontal Pod Autoscaler).

Again, this blog post only touches on the surface of our Kubernetes iceberg; there's a tremendous number of little implementation details, process improvements, and discoveries that came as part of this transition. Many of them do deserve their own blog post, and now that we've set some context, those entries can come. Stay tuned!