Building a Kubernetes Ingress Controller with Caddy

Update: On April 16th 2021, Slash GraphQL was officially renamed Dgraph Cloud. All other information below still applies.

Slash GraphQL hosts a lot of Dgraph backends. Like seriously, a lot! Though we launched about 3 months ago, our biggest region (AWS us-west-2) already has something in the excess of 4000 backends launched.

An ingress controller is a service that sits in front of your entire kube cluster, and routes requests to the appropriate backend. But what happens when your ingress controller has such a huge number of backends to take care of?

That’s 4000 GraphQL endpoints, 4000 gRPC endpoints, adding up to a total of 8000 ingress hostnames! That’s also about 12000 containers that would be running if we just let everything run wild! I can’t even imagine what the AWS bill would be! Let’s try to tackle these problems, and find a way to get rid of all those extra running containers.

Freezing inactive backends

Luckily, we found a quick way to fix the problem of all those running containers. For backends on our free tier, we simply put them to sleep when they are inactive. On the next request for that backend, we hold the request until Dgraph finishes spinning up, then forward the request to Dgraph. This drastically reduces the number of live pods to the ones which have been active in the last few hours, at the expense of latency on the first request for infrequently used backends.

This freezing logic was implemented using a reverse proxy (built on top of Golang’s ReverseProxy) that sits between the Nginx Ingress controller and Dgraph, waking up Dgraph backends just in time. This worked for a while, and drastically cut down the number of pods we were running, but left us with a few new problems.

  1. Although this solution let us scale down the Dgraph containers, it still left the proxy containers that can’t be taken down. This meant that we had to keep the Kubernetes namespace and ingress around. As a result, the kube control plane would start to crawl every time we tried to fetch or update things in bulk.
  2. Nginx would start to slow down a lot whenever an ingress was added, and would often run out of memory. Granted, we were running nginx with very little memory (1.5GB), but we had to increase that limit every few hundred backends or so
  3. A request went through a lot of hops before it finally reached Dgraph. Requests hop through an application load balancer, an ingress (Nginx), a Golang proxy, and then finally on to Dgraph. We were hoping to reduce these to a single hop

Eventually, we decided to bite the bullet, and try to write our own ingress controller.

Writing an ingress controller

Writing an ingress controller always seemed like such a daunting task, something that could only be accomplished by the Kubernetes gods that walk amongst us. How could us mere mortals deal with so many ingresses? ingress-es? Or is it ingressi?

How do I deal with this task if I can’t even figure out what the plural of ingress is?

But as it turns out, it’s actually pretty simple. All you need to do in an ingress controller is two things:

  1. Figure out the list of ingresses and what service they map to
  2. Forward the request to the correct service

Writing the Kubernetes bits

Please See dgraph-io/ingressutil: Utils for building an ingress controller for a working implementation of the concepts described here

The Kubernetes bits turned out to be much, much simpler than I had imagined. Kubernetes has an amazing Go client, which comes with a feature called SharedInformers. An informer listens to kube, and calls a call-back that you provide every time a resource is created, deleted, or updated (with some resync-type thing happening in case it misses an update). Simply ensure that you give the correct permissions to the controller and that you process the data correctly.

Let’s construct kubeClient and listen to updates on the ingress (code from ingressutil/ingress_router.go)

func getKubeClient() *kubernetes.Clientset {
	cfg, err := rest.InClusterConfig()
	if err != nil {
		glog.Fatalln(err)
	}

	kubeClient, err := kubernetes.NewForConfig(cfg)
	if err != nil {
		glog.Fatalln(err)
	}
	return kubeClient
}

func (ir *ingressRouter) StartAutoUpdate(ctx context.Context, kubeClient *kubernetes.Clientset) func() {
	factory := informers.NewSharedInformerFactory(kubeClient, time.Minute)

	informer := factory.Extensions().V1beta1().Ingresses().Informer()
	informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
		AddFunc:    ir.addIngress,
		UpdateFunc: ir.updateIngress,
		DeleteFunc: ir.removeIngress,
	})

	go informer.Run(ctx.Done())
}

I won’t go into details on how addIngress, update and remove ingress work, but we simply keep a map of ingresses, indexed by their namespace+name. Every time we get an update, we wait an appropriate amount of time (25ms), then regenerate our routing table.

Here is what an ingress looks like

{
  "namespace": "myapp",
  "name": "myingress",
  "service": "service1",
  "port": 80,
  "http": {
    "host": "foo.com",
    "paths": [
      {"path": "/admin"}
    ]
  }
}

And here is what our routing table looks like. It’s just a map by hostname, and the first path that matches is the destination service

{
  "foo.com": [{"path": "/admin", service: "service1.myapp.svc:80"}, {"path": "/", service: "service2.otherapp.svc:80"}]
}

Converting between this set of ingresses to a map from hostname to this sort of routing table is a fairly simple computation, and you can see exactly how this works in the ingressutil/route_map.go. Further, matching the request to a service is just a matter of matching the hostname and path, as seen below

func (rm *routeMap) match(host, path string) (namespace string, name string, serviceendpoint string, ok bool) {

	for _, entry := range rm.hostMap[host] {
		if strings.HasPrefix(path, entry.path.Path) {
			return entry.Namespace, entry.Name, entry.ServiceName + "." + entry.Namespace + ".svc:" + entry.ServicePort.String(), true
		}
	}

	return "", "", "", false
}

Finally, in order to avoid nasty race conditions, we store the entire route table in an atomic.Value, and replace the entire table instead of updating the existing one in memory.

Actually forwarding to the service - Caddy all the way!

Please see dgraph-io/ingressutil/caddy for a working implementation of the concepts described here

OK, so that’s the parts that listen to kube, and how we figure out where requests are supposed to go. But what about actually forwarding requests to their final destination?

When we first started playing around with writing an ingress controller, we initially used to forward requests to our original Nginx. However, this was really problematic, as there was no way to figure out if Nginx had picked up a new ingress or not. It often took 2-5 seconds after our code picked up a new ingress for Nginx to be ready to serve it, meaning that we had to add a lot of time.Sleep() type code, which no one ever wants to read. We briefly tried writing an Nginx module, but C++ makes me want to tear my hair out.

We were looking for a production ready proxy that was written in a language that we already work with - Go. One day, Manish pointed me at Caddy, and it was love at first read.

Caddy is a reverse proxy written in Golang. You can compile Caddy with plugins written in go, and Caddy 2 has been written with this extensibility in mind. Other proxies like Traefik and Ambassador also do have some support for plugins, but I found Caddy the easiest to work with. As an added bonus, the Caddy http middleware module is very close to Go’s ServeHTTP interface, and as a result, is very easy to build around and test.

Caddy even has a tool, xcaddy, which can be used to compile a custom build of Caddy, along with whatever plugins you want.

We structured our code as two plugins for Caddy, and our caddy.json looked something like this.

{
   "apps" : {
      "http" : {
         "servers" : {
            "slash-ingress" : {
               "routes" : [
                  {
                     "path": "/",
                     "handle" : [
                        { "handler": "slash_graphql_internal" },
                        { "handler": "ingress_router" }
                     ]
                  }
               ]
            }
         }
      }
   }
}

Caddy’s HTTP handlers work like a middleware chain: they call your first plugin, with a next() function bound to the next plugin in the chain. In our case, the first plugin wakes up the relevant DB creates the necessary ingress if needed. It also does a number of miscellaneous tasks such as checking auth, rate limits, and other custom logic. And then it finally hands it over to the ingress_router plugin, who is expected to forward it to the correct server. Let’s take a closer look at ingress_router

Since Caddy was built as a reverse proxy, I had a hunch that the code to forward requests was exposed as a module somewhere. In fact, all of Caddy’s internal modules implement the same interface as their external counterparts, which meant that Caddy’s internal reverse proxy (reverseproxy.Handler) implement the same interface as my ingress aware reverse proxy! This sounds like a job for… the decorator pattern!

// Partially from https://github.com/dgraph-io/ingressutil/blob/main/caddy/ingress_router.go
type IngressRouter struct {
	proxyMap      sync.Map
}

func (ir *IngressRouter) ServeHTTP(w http.ResponseWriter, r *http.Request, next caddyhttp.Handler) error {
	proxy, ok := ir.getUpstreamProxy(r)
	if !ok {
		return next(w, r)
	}

	return proxy.ServeHTTP(w, r, next)
}

func (ir *IngressRouter) getUpstreamProxy(r *http.Request) (caddyhttp.MiddlewareHandler, bool) {
	namespace, service, upstream, ok := ir.Router.MatchRequest(r)
	if !ok {
		return nil, false
	}

	if proxy, ok := ir.proxyMap.Load(upstream); ok {
		return proxy.(caddyhttp.MiddlewareHandler), true
	}

	proxy := &reverseproxy.Handler{ Upstreams: reverseproxy.UpstreamPool{&reverseproxy.Upstream{Dial: upstream}} }
	proxy.Provision(ir.caddyContext)

	proxyInMap, loaded := ir.proxyMap.LoadOrStore(upstream, proxy)
	if loaded {
		proxy.Cleanup()
	}
	return proxyInMap, true
}

So all we need to do is to keep a big map of the service endpoint to the actual reverse proxy. And that’s all that our middleware does, It keeps a sync.Map, where the key is the service endpoint, and the value is a reverse proxy to forward the request to. You can see the full code for this over here: ingressutil/caddy/ingress_router.go.

On every request, we match the request to a service, and then forward it to the appropriate reverse proxy, and Caddy takes over once again!

Setting up Permissions in Kubernetes

In order for your ingress controller to listen in on the ingress changes, you’ll need to grant the correct permissions in the RBAC config for the deployment.

Here is the ClusterRole that you’ll need to attach to your deployment. These are the minimum permissions I needed to give to test this out.

- apiGroups: ["apps"]
  resources: ["ingresses"]
  verbs: ["get", "list", "watch"]

Putting it all together

Using Caddy, we were able to build an Ingress controller for Slash GraphQL in about a week. While Caddy does have an WIP ingress controller, we have a lot of custom features such as creating ingresses on the fly, and we wanted something that we can build and extend. We have also open sourced the tools we used to build our ingress controller, and you can find them here: github.com/dgraph-io/ingressutil

Our new ingress has been live for some time now, and early results are looking good. Prometheus metrics show less than a millisecond extra time on critical requests in order to do all the routing as described in this post.

I believe Caddy and it’s extensibility are a great fit if you are trying to build a smart proxy (AKA: An API Gateway). Perhaps you want to do some authentication, rate limiting, or some other custom logic that you need to extend with custom code.

Caddy’s author Matt Holt wrote that Caddy is not just a proxy, it’s a powerful, extensible platform for HTTP apps. I couldn’t agree more.