Accelerated single-node Kubernetes with microk8s, Kubeflow, and RAPIDS.ai

The fundamental value proposition of Kubernetes is that it can provide an abstraction layer for distributed applications on any infrastructure. This depends on the intersection of two partial myths:

Kubernetes consistently provides every essential service that distributed applications need, and
Kubernetes can run equally well on different cloud footprints, in a datacenter, or on a single node.

There’s some reflection of the truth in these myths. Vanilla upstream Kubernetes provides many important primitives for distributed applications — but not everything — and individual Kubernetes distributions typically bundle services to address the gaps. While it is technically possible to run Kubernetes on a single node by installing a specialized distribution, most of the solutions are rough around the edges, and if you want true portability, you’ll need to run the same Kubernetes distribution on your workstation (or laptop) and in your datacenter.

Since I’m more interested in developing tools that could be ported to a variety of Kubernetes distributions than I am with developing an application that is absolutely reproducible across multiple footprints in a single organization without additional effort, I have the flexibility to choose any single-node distribution of Kubernetes for local use.

I’ve been impressed with the setup and user experience of microk8s for a long time and used to run it (in a VM) on my old MacBook. In this post, I’ll explain how I used microk8s to set up a data science development environment on my workstation, complete with GPU acceleration, Kubeflow, and a notebook image with RAPIDS.ai preloaded.

System setup

I started with a relatively fresh installation of Ubuntu 20.04,¹ and installed CUDA 11.0 from the NVIDIA repository, following these instructions:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda-11-0

microk8s

I then installed microk8s with snap. There are a few options for versions and channels, but as of late March 2021 the most stable for me was Kubernetes 1.20 (more on this in a bit).

sudo snap install microk8s --classic --channel=1.20/stable
sudo microk8s status --wait-ready

microk8s ships with Calico enabled, and there is a longstanding bug in Calico that prevents it from finding network interfaces that contain lo anywhere in their names. Since the wireless interface in my workstation is called wlo2, I needed to change Calico’s environment to get it to work:

microk8s kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=wlo.\*

GPU support

With CUDA 11.0 installed, I was able to enable GPU support in microk8s:

microk8s enable dns
microk8s enable gpu

I could then verify that the pods started successfully:

microk8s kubectl describe pods -l name=nvidia-device-plugin-ds -n kube-system

and, once they had, that my node had been labeled properly:

microk8s kubectl get node -o jsonpath="{range .items[*]}{..allocatable}{'\n'}{end}"

You’ll want to see an allocatable resource of type nvidia.com/gpu in that output, like this (replace “2” with the number of GPUs your workstation has installed):

{
  "cpu":"...",
  "ephemeral-storage":"...",
  "hugepages-1Gi":"...",
  "hugepages-2Mi":"...",
  "memory":"...",
  "nvidia.com/gpu":"2",
  "pods":"..."
}

I could then launch a simple job to verify that I was able to schedule pods to run on the GPU:

cat << EOF | microk8s kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      image: "k8s.gcr.io/cuda-vector-add:v0.1"
      resources:
        limits:
          nvidia.com/gpu: 1
EOF

Kubeflow and notebooks

Now I was able to install Kubeflow itself (feel free to specify your favorite password in these instructions):

microk8s enable ingress istio
microk8s enable kubeflow -- --password my-ultra-secure-password --bundle lite

Once Kubeflow was up, I created a persistent volume to enable shared storage between my notebook servers and the host system:

mkdir $HOME/k8s-share
cat << EOF | microk8s kubectl create -f -
apiVersion: v1
kind: PersistentVolume
metadata:
  name: $USER-share
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "$HOME/k8s-share"
EOF

In my case, this created a persistent volume called willb-share so that I could mount k8s-share from my home directory as a data volume on a Kubeflow notebook server.

The next step was to get RAPIDS.ai set up in a Kubeflow notebook image, since Kubeflow no longer ships a RAPIDS image. I could have installed individual libraries in a notebook container but there was an easier option. Since the RAPIDS project publishes a variety of Docker images, I could pick one of those as a starting point and ensure that the resulting image would be usable by Kubeflow, like in this Dockerfile:

FROM rapidsai/rapidsai-core:0.18-cuda11.0-runtime-centos7-py3.7

ENV NB_PREFIX /

RUN ldconfig

CMD ["sh","-c", "jupyter notebook --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]

The only interesting part of that recipe is RUN ldconfig, which I found necessary so that the Python notebook kernels could find CUDA. I built that image locally and pushed it to an accessible repository so that I could use it while launching a new notebook server from the Kubeflow dashboard.

Remote access

I often prefer to access my workstation remotely even if I’m at my desk. For running a regular Jupyter notebook server from the command line, this is just a matter of binding to 0.0.0.0 or setting up an SSH tunnel.² Accessing services running on a single-node Kubernetes deployment is slightly more complicated, though.

First up, all requests will need to go through a load balancer (in this case, Istio). We can access Istio through an external IP, and we can find out which IP that is by inspecting the ingress gateway service:

kubectl get svc istio-ingressgateway -n kubeflow -o jsonpath="{..loadBalancer..ip}{'\n'}"

However, simply connecting to my workstation and forwarding traffic to port 80 of Istio’s IP didn’t do me a lot of good; I found that I also needed to be able to connect to other cluster IPs in order to access the services Istio was exposing.³ To access all of these IPs remotely, we have a couple of options:

Set up a dynamic proxy over SSH. By connecting with ssh -D 9999 workstation and then configuring a local proxy to point to localhost on port 9999, we can access anything that’s accessible from workstation. This is an easy way to smoke-test a deployment but it isn’t an ideal long-term solution because it requires you to maintain a ssh connection to your workstation and it proxies everything though the workstation unless you explicitly configure the proxy.⁴ Relying on a dynamic proxy like this can also lead to confusing errors when the SSH connection is down and may be difficult or impossible to configure when using cellular data on a mobile device.
Use a VPN-like service to relay traffic to given subnets. I use Tailscale for a personal VPN and configured my workstation as a relay node for traffic to cluster IP addresses. This means that if I can access cluster IPs from my workstation, I can also access them from any computer connected to my Tailscale account (whether or not I’ve connected over SSH first). This was very easy and it’s also possible to do with upstream WireGuard.

Once we’re correctly forwarding traffic, connecting to the Istio load balancer will show us the Kubeflow dashboard; clicking the different links on that page (e.g., to create a new notebook server) will send requests to the appropriate internal services.

Challenges and false starts

Sometimes knowing what didn’t work is more useful than knowing what did. In this section, I’ll briefly cover some problems I encountered along the way so you’ll know what to look out for.

Device plugin errors

With microk8s 1.19, I was unable to get the device plugin pod to run successfully and always got some variant of this error in my logs:


Loading NVML
Failed to initialize NVML: could not load NVML library.
If this is a GPU node, did you set the docker default runtime to nvidia?

This is a confusing error because microk8s uses containerd and not Docker. While many people seem to have run in to this error online, none of the recommended solutions worked for me. (I also tried specifying a newer version of the device plugin container image, which was also not successful.)

GPU operator errors

The beta release of microk8s 1.21 uses the NVIDIA GPU operator to manage GPU drivers. As of mid-March 2021, the GPU operator is not intended to work on nodes that already have GPU drivers installed, which makes it more suitable for provisioning new nodes or VMs and adding them to a cluster (its intended use case, to be fair) than for enabling GPU support for a single-node Kubernetes on a workstation.⁵

I was able to enable GPU support in microk8s 1.21 by first removing CUDA and GPU drivers from my system, but this was an unpalatable hack since I’d prefer to be able to manage system dependencies with a native package manager (and also to use the GPU and CUDA outside of Kubernetes). I also noticed that the GPU operator failed to start after I had rebooted my system, presumably because it had installed the drivers before and they loaded on boot.

Image pull failures

After installing microk8s 1.20, the Calico pod failed due to an image pull timeout. I was able to explicitly pull it using the bundled ctr tool before restarting the pods:

microk8s ctr images pull docker.io/calico/cni:v3.13.2

Conclusions

While I wouldn’t recommend single-node Kubernetes to most machine learning practitioners (it still requires a lot of interaction with Kubernetes proper to get to a productive state or troubleshoot problems), Kubernetes provides some useful primitives for managing resources, isolating jobs, and making work reproducible. Furthermore, developing ML tools on Kubernetes ensures that they’ll be consumable in multiple contexts. The combination of microk8s and Kubeflow provides a relatively-painless way to get to a productive discovery environment with RAPIDS and GPUs. In future posts, I’d like to look at using my single-node Kubernetes deployment to orchestrate other machine-learning and data processing workloads.

Footnotes

Be warned that I’m almost certainly doing some basic administration tasks suboptimally – while I used Debian at a consulting gig in the late 1990s and briefly used Ubuntu in the public cloud in graduate school, my main Linux distributions have been RPM-based for over 25 years. I chose Ubuntu for this application because it offered frictionless installation of GPU drivers – but the long support cycle vis-à-vis other community Linux distributions is also a plus.↩︎
I’m often connecting remotely from a tablet, and tunneling in is especially convenient from my favorite iOS Jupyter client.↩︎
This didn’t make a lot of sense to me, but individual Kubeflow dashboard components were exposed with wildcard DNS hostnames pointing to cluster IPs – and, if I couldn’t connect to the cluster IPs, it manifested as unusual “Page not found” errors from the Kubeflow dashboard.↩︎
If you wanted to use this solution longer-term, it’d make sense to define a Proxy Auto-Configuration File that deferred to the dynamic proxy only for wildcard DNS hostnames like xip.io.↩︎
This is a totally sensible design decision since a single-node Kubernetes deployment is not anywhere near the primary audience for a tool like the GPU operator. However, the upcoming release of the GPU operator will support this workstation use case by allowing users to skip driver installation; microk8s will incorporate this fix as well.↩︎