Running akka-cluster on Kubernetes

Published in

SoftwareMill Tech Blog

6 min readJun 5, 2018

In this article I won’t go into akka-cluster details and why it’s great. You can find a lot of excellent articles about it. Of course a good starting point is the very good akka documentation. Since kubernetes is becoming a standard way of running contenerized applications on multiple servers I want to describe how to run akka-clusters on kubernetes and how to develop such application.

Clusters — the challeges

Running clustered application can be challenging. Fortunately most of the complicated stuff like: leader election, convergence, failure detection, etc, is already built into akka itself and you don’t need to worry about it (I recommend you to read akka documentation which describes how it works).

You — as an operator — are responsible for forming the cluster, adding new nodes, removing nodes from cluster and handling failures like split-brain or node crashes.

Generally the akka-cluster is formed like this:

The first node starts and joins itself. It’s called seed node since other nodes will use it’s address as the contact point to get cluster design. It becomes the cluster leader.
Next node starts, connects to seed node(s) and tries to join the cluster.
When all cluster nodes “see” the new member the cluster leader sets it’s state to up.
If the node leaves a cluster in the graceful way it changes it’s state to leavingand the leader removes it from the cluster.
The leaving node’s state changes to removed.

Looks pretty easy, but there are some problems with this flow. First (and probably most important) — it works only on “stable” clusters. When one or more nodes are unreachable (for example after hardware failure) the leader can’t perform it’s duties. So you have to start the node again or mark it as Down (manually or using some scripts). The second problem: when the seed node goes down there is no way to add new nodes to cluster.

On the “static” system (when all cluster nodes run on predefined IP numbers and ports) it’s possible to avoid those problems with for example running some scripts to down unreachable nodes or configuring multiple seed nodes (which is the best practice). But things are becoming really complicated when cluster runs on orchestration system like AWS ECS or kubernetes. As you probably know pods on k8s are ephemeral which means they can go down and up again at any time. Also the IPs and ports are assigned dynamically.

Here shines the new way of forming akka-cluster: cluster bootstrap and service discovery.

Cluster bootstrap

Cluster bootstrap is the akka extention which allows akka-cluster nodes to discover other nodes and join the cluster or form a new cluster. There are several discovery method based on DNS or some external APIs like Amazon ECS, marathon, etc. We’ll focus on discovery method which uses kubernetes API.

Gracefully downing cluster nodes

CoordinatdShutdown is the akka extension introduced in akka 2.5. It can stop actors and services in specific order when the application is shut down. This extension is fully configurable and you can define your own shutdown procedure configuring shutdown phases.

When clustering is enabled the default configuration is exactly what is expected — when SIGTERM is sent to jvm process the application will sent its own cluster state as leaving and eventually will down itself gracefully. When application runs in container and the container is stopped (for example using docker stop command) SIGTERM is sent to PID 1 end things run exactly as we want.

Something bad happened…

It can happen. Well, actually it will happen. Sooner or later. The container will be killed by OS because of OOM. The host will brake because of hardware failure. The network partition will occur because of faulty network switch…

Of course coordinated shutdown will not help in such cases — for example when the SIGINT is sent to application it will stop working immediately and other akka cluster nodes will see such node as unreachable.

We need a process which will heal our cluster — the Split Brain Resolver (SBR).

There is a great commertial SBR implementation sold as part of commercial subscription by Lightbend. There are also several open source solutions, you have to decide yourself. In the example application described below I use very interesting solution: akka cluster custom downing.

Ok, let’s write some code and see all those concepts in action. You can download example application from github and try it yourself on your own kubernetes cluster.

Simple akka cluster

First things first — we have to start cluster discovery. Since it relies on akka management extension we have to start it as well:

Most important part is configuration.

To run cluster bootstrap and service discovery we must not set seed nodes:

As you can see we can set akka.cluster.seed-nodes using environment variable. It becomes handy when running application locally outside kubernetes during development. You have to remember seed-nodes is the array so if you want to set it you have to use VARIABLE.0, VARIABLE.1 etc. notation. For example:

docker run -ti -p 8558:8558 \
-e SEED_NODES.0=”akka.tcp://akka-simple-cluster@172.17.0.2:2551" \
softwaremill/akka-simple-cluster-k8s:0.0.1-SNAPSHOT

There are several downing strategies we can use in SBR, in this case I will use MajorityLeaderAutoDowning. This strategy will keep the partition with more reachable nodes. If the nodes number is the same in both partition (for example when running even number of nodes) the partition with lowest address is kept. Generally — it’s always a good idea to run odd number of nodes in distributed systems :).

Let’s configure cluster management:

And the discovery itself:

The whole application.conf file is on GitHub.

Let’s deploy our brand new application on minikube.

We have to start minikube:

minikube start

and set some environment variables pointing to minikube instead of our local docker daemon:

eval $(minikube docker-env) # on bash, or
eval (minikube docker-env) # on fish, my favourite shell

Let’s build docker container:

gkocur@gk ~/D/w/s/b/a/akka-simple-cluster-k8s (master)> sbt docker:publishLocal
[info] Loading settings from idea.sbt ...
[info] Loading global plugins from /Users/gkocur/.sbt/1.0/plugins
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from /Users/gkocur/Desktop/work/sml/blog/akka-k8s/akka-simple-cluster-k8s/project
[info] Loading settings from build.sbt ...
...
[info] Successfully built c98c54dc9e52
[info] Successfully tagged softwaremill/akka-simple-cluster-k8s:0.0.1-SNAPSHOT
[info] Built image softwaremill/akka-simple-cluster-k8s:0.0.1-SNAPSHOT
[success] Total time: 8 s, completed May 31, 2018 8:33:05 PM

We can now apply our deployment:

kubectl apply -f k8s/simple-cluster-deployment.yml

and the service:

kubectl apply -f k8s/simple-cluster-service.yml

The cluster needs few seconds to bootstrap. Let’s check the status:

KUBE_IP=$(minikube ip)
MANAGEMENT_PORT=$(kubectl get svc akka-simple-cluster -ojsonpath="{.spec.ports[?(@.name==\"management\")].nodePort}")curl http://$KUBE_IP:$MANAGEMENT_PORT/cluster/members | jq
{
  "selfNode": "akka.tcp://akka-simple-cluster@172.17.0.11:2551",
  "leader": "akka.tcp://akka-simple-cluster@172.17.0.10:2551",
  "oldest": "akka.tcp://akka-simple-cluster@172.17.0.10:2551",
  "unreachable": [],
  "members": [
    {
      "node": "akka.tcp://akka-simple-cluster@172.17.0.10:2551",
      "nodeUid": "-1472917702",
      "status": "Up",
      "roles": [
        "dc-default"
      ]
    },
    {
      "node": "akka.tcp://akka-simple-cluster@172.17.0.8:2551",
      "nodeUid": "-375470270",
      "status": "Up",
      "roles": [
        "dc-default"
      ]
    },
    {
      "node": "akka.tcp://akka-simple-cluster@172.17.0.11:2551",
      "nodeUid": "1000843329",
      "status": "Up",
      "roles": [
        "dc-default"
      ]
    },
    {
      "node": "akka.tcp://akka-simple-cluster@172.17.0.9:2551",
      "nodeUid": "1793471128",
      "status": "Up",
      "roles": [
        "dc-default"
      ]
    },
    {
      "node": "akka.tcp://akka-simple-cluster@172.17.0.7:2551",
      "nodeUid": "83031023",
      "status": "Up",
      "roles": [
        "dc-default"
      ]
    }
  ]
}

Summary

Kubernetes is the abstraction layer between infrastructure and applications. It can dramatically speed up the process of development and deployment, but on the other hand it introduces new challenges. Fortunately akka follows new trends and cluster bootstraping integrates with moderm orchestration systems like kubernetes or marathon.

Like this post and interested in learning more? Follow us on Medium! Need help with your Cassandra, Kafka or Scala projects? Just contact us here.