Running Chaos Experiments on Kubernetes

Table of contents

Reading Time: 4 minutes

Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.

Ever since Netflix introduced us to Chaos Engineering, there have been different tools in different form and shape for running Chaos Experiments in different platforms.

In this blog, we will learn how to run chaos engineering experiments with Kube-monkey on a kubernetes cluster.

Kube-Monkey is an implementation of netflix chaos monkey and it’s specially build for kubernetes clusters. So kube-monkey periodically schedules a list of pod termination events and by terminating the pod its’s very useful for us to test the fault-tolerance of your highly available system.

Prerequisite

Kubernetes Cluster
Helm

Create a new Helm chart

We will now create a new Helm chart (a collection of templated Kubernetes manifests) which will call nginx:

helm create nginx

After that we will create a namespace for our target application

kubectl create ns nginx

And finally we deploy 10 replicas of our nginx application, using Helm, to our nginx namespace:

helm upgrade --install nginx ./nginx \
  -n nginx \
  --set replicaCount=10

To check whether the deployment was successful using both Helm and kubectl you can use following commands:

helm ls -n nginx
kubectl get pod -n nginx

You will see your release is deployed and there should be 10 pods running in the cluster.

Make the application as target

In order for pods to be considered by Kube-Monkey, we need to add specific labels to the Kubernetes deployment manifest file.

You have to modify deployment.yaml file to include new kube-monkey labels to both the metadata.labels and spec.template.metadata.labels sections. You will find this file under ./nginx/templates/deployment.yaml path.

Your deployment.yaml file should be looks like :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "nginx.fullname" . }}
  labels:
    {{- include "nginx.labels" . | nindent 4 }}
    kube-monkey/enabled: "enabled"              # Enable termination of this deployment
    kube-monkey/identifier: "nginx-victim"      # Custom name for our target
    kube-monkey/mtbf: "1"                       # Average number of days between targeting one of these pods
    kube-monkey/kill-mode: "random-max-percent" # The killing method
    kube-monkey/kill-value: "100"               # Killing values, depends on chosen killing method
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "nginx.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      {{- with .Values.podAnnotations }}
      annotations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      labels:
        {{- include "nginx.selectorLabels" . | nindent 8 }}
        kube-monkey/enabled: "enabled"          # See here also
        kube-monkey/identifier: "nginx-victim"  # See here also
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "nginx.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /
              port: http
          readinessProbe:
            httpGet:
              path: /
              port: http
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}

Let’s upgrade our deployment in the cluster and check its status.

helm upgrade --install nginx ./nginx -n nginx
helm ls -n nginx
kubectl get pod -n nginx

Install Kube-Monkey

Now let’s introduce Kube-Monkey into our cluster and start creating some chaos.

Clone the repo:

git clone https://github.com/asobti/kube-monkey

Now create a new namespace for the Kube-Monkey deployment and deploy using Helm, same as we did for our Nginx application:

kubectl create ns kube-monkey  # Create the namespace
# Deploy Kube-Monkey
helm upgrade --install kube-monkey ./kube-monkey/helm/kubemonkey \
  -n kube-monkey \
  --set config.debug.enabled=true \
  --set config.debug.schedule_immediate_kill=true \
  --set config.dryRun=false \
  --set config.whitelistedNamespaces="{nginx}"
# Check the deployment status
helm ls -n kube-monkey
kubectl get pod -n kube-monkey

Because we have set schedule_immediate_kill to true, Kube-Monkey will immediately start applying the configured kill instructions. We can see this working by checking out the Kube-Monkey logs:

kubectl logs -n kube-monkey -l release=kube-monkey -f

Let’s just check that our pods are actually being killed

kubectl get pod -n nginx -w

There we have it, it looks like 5 of our 10 pods have been killed in the last 12s as well now.

Reference

Chaos Monkey

Conclusion

In conclusion, it is a highly reliable experiment that we did and you found out that kubernetes replica set is working. Whenever an existing pod gets terminated, kubernetes will Spin up a new pod. That’s a type of experiments that you can run with kube-monkey. I hope this blog is informative and you learned about kube-monkey, how to install it, how to enable the labelling system on you deployments.