Migrating MongoDB to Kubernetes

Migrating MongoDB to Kubernetes This blog post is the last in the series of articles on migrating databases to Kubernetes with Percona Operators. Two previous posts can be found here:

As you might have guessed already, this time we are going to cover the migration of MongoDB to Kubernetes. In the 1.10.0 release of Percona Distribution for MongoDB Operator, we have introduced a new feature (in tech preview) that enables users to execute such migrations through regular MongoDB replication capabilities. We have already shown before how it can be used to provide cross-regional disaster recovery for MongoDB, we encourage you to read it.

The Goal

There are two ways to migrate the database:

Take the backup and restore it.
– This option is the simplest one, but unfortunately comes with downtime. The bigger the database, the longer the recovery time is.
Replicate the data to the new site and switch the application once replicas are in sync.
– This allows the user to perform the migration and switch the application with either zero or little downtime.

This blog post is a walkthrough on how to migrate MongoDB replica set to Kubernetes with replication capabilities.

MongoDB replica set to Kubernetes

We have a MongoDB cluster somewhere (the Source). It can be on-prem or some virtual machine. For demo purposes, I’m going to use a standalone replica set node. The migration procedure of a replica set with multiple nodes or sharded cluster is almost identical.
We have a Kubernetes cluster with Percona Operator (the Target). The operator deploys 3 standalone MongoDB nodes in unmanaged mode (we will talk about it below).
Each node is exposed so that the nodes on the Source can reach them.
We are going to replicate the data to Target nodes by adding them into the replica set.

As always all blog post scripts and configuration files are available publicly in this Github repository.

Prerequisites

MongoDB cluster – either on-prem or VM. It is important to be able to configure mongod to some extent and add external nodes to the replica set.
Kubernetes cluster for the Target.
kubectl to deploy and manage the Operator and database on Kubernetes.

Prepare the Source

This section explains what preparations must be made on the Source to set up the replication.

Expose

All nodes in the replica set must form a mesh and be able to reach each other. The communication between the nodes can go through the public internet or some VPN. For demonstration purposes, we are going to expose the Source to the public internet by editing mongod.conf:

net:
  bindIp: <PUBLIC IP>

1 2	net: bindIp: <PUBLIC IP>

If you have multiple replica sets – you need to expose all nodes of each of them, including config servers.

TLS

We take security seriously at Percona, and this is why by default our Operator deploys MongoDB clusters with encryption enabled. I have prepared a script that generates self-signed certificates and keys with the openssl tool. If you already have Certificate Authority (CA) in use in your organization, generate the certificates and sign them by your CA.

The list of alternative names can be found either in this document or in this openssl configuration file. Note DNS.20 entry:

DNS.20      = *.mongo.spron.in

1	DNS.20 = *.mongo.spron.in

I’m going to use this wildcard entry to set up the replication between the nodes. The script also generates an ssl-secret.yaml file, which we are going to use on the Target side.

You need to upload CA and certificate with a private key to every Source replica set node and then define it in the mongod.conf:

# network interfaces
net:
  ...
  tls:
    mode: preferTLS
    CAFile: /path/to/ca.pem
    certificateKeyFile: /path/to/mongod.pem

security:
  clusterAuthMode: x509
  authorization: enabled

# network interfaces

net:

...

tls:

mode: preferTLS

CAFile: /path/to/ca.pem

certificateKeyFile: /path/to/mongod.pem

security:

clusterAuthMode: x509

authorization: enabled

Note that I also set clusterAuthMode to x509. It enforces the use of x509 authentication. Test it carefully on a non-production environment first as it might break your existing replication.

Create System Users

Our Operator needs system users to manage the cluster and perform health checks. Usernames and passwords for system users should be the same on the Source and the Target. This script is going to generate the user-secret.yaml to use on the Target and mongo shell code to add the users on the Source (it is an example, do not use it in production).

Connect to the primary node on the Source and execute mongo shell commands generated by the script.

Prepare the Target

Apply Users and TLS secrets

System users’ credentials and TLS certificates must be similar on both sides. The scripts we used above generate Secret object manifests to use on the Target. Apply them:

$ kubectl apply -f ssl-secret.yaml
$ kubectl apply -f user-secret.yam

1 2	$ kubectl apply -f ssl-secret.yaml $ kubectl apply -f user-secret.yam

Deploy the Operator and MongoDB Nodes

Please follow one of the installation guides to deploy the Operator. Usually, it is one step operation through kubectl:

$ kubectl apply -f operator.yaml

1	$ kubectl apply -f operator.yaml

MongoDB nodes are deployed with a custom resource manifest – cr.yaml. There are the following important configuration items in it:

spec:
  unmanaged: true

1 2	spec: unmanaged: true

This flag instructs Operator to deploy the nodes in unmanaged mode, meaning they are not configured to form the cluster. Also, the Operator does not generate TLS certificates and system users.

spec:
…
  updateStrategy: Never

spec:

…

updateStrategy: Never

Disable the Smart Update feature as the cluster is unmanaged.

spec:
…
  secrets:
    users: my-new-cluster-secrets
    ssl: my-custom-ssl
    sslInternal: my-custom-ssl-internal

spec:

…

secrets:

users: my-new-cluster-secrets

ssl: my-custom-ssl

sslInternal: my-custom-ssl-internal

This section points to the Secret objects that we created in the previous step.

spec:
…
  replsets:
  - name: rs0
    size: 3
    expose:
      enabled: true
      exposeType: LoadBalancer

spec:

…

replsets:

- name: rs0

size: 3

expose:

enabled: true

exposeType: LoadBalancer

Remember, that nodes need to be exposed and reachable. To do this we create a service per Pod. In our case, it is a LoadBalancer object, but it can be any other Service type.

spec:
...
  backup:
    enabled: false

spec:

...

backup:

enabled: false

If the cluster and nodes are unmanaged, the Operator should not be taking backups.

Deploy unmanaged nodes with the following command:

$ kubectl apply -f cr.yaml

1	$ kubectl apply -f cr.yaml

Once nodes are up and running, also check the services. We will need the IP-addresses of new replicas to add them later to the replica set on the Source.

$ kubectl get services
NAME                    TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)           AGE
…
my-new-cluster-rs0-0    LoadBalancer   10.3.252.134   35.223.104.224   27017:31332/TCP   2m11s
my-new-cluster- rs0-1   LoadBalancer   10.3.244.166   34.134.210.223   27017:32647/TCP   81s
my-new-cluster-rs0-2    LoadBalancer   10.3.248.119   34.135.233.58    27017:32534/TCP   45s

$ kubectl get services

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

…

my-new-cluster-rs0-0 LoadBalancer 10.3.252.134 35.223.104.224 27017:31332/TCP 2m11s

my-new-cluster- rs0-1 LoadBalancer 10.3.244.166 34.134.210.223 27017:32647/TCP 81s

my-new-cluster-rs0-2 LoadBalancer 10.3.248.119 34.135.233.58 27017:32534/TCP 45s

Configure Domains

X509 authentication is strict and requires that the certificate’s common name or alternative name match the domain name of the node. As you remember we had wildcard *.mongo.spron.in included in our certificate. It can be any domain that you use, but make sure a certificate is issued for this domain.

I’m going to create A-records to point to public IP-addresses of MongoDB nodes:

k8s-1.mongo.spron-in -> 35.223.104.224
k8s-2.mongo.spron.in -> 34.134.210.223
k8s-3.mongo.spron-in -> 34.135.233.58

k8s-1.mongo.spron-in -> 35.223.104.224

k8s-2.mongo.spron.in -> 34.134.210.223

k8s-3.mongo.spron-in -> 34.135.233.58

Replicate the Data to the Target

It is time to add our nodes in the Kubernetes cluster to the replica set. Login into the mongo shell on the Source and execute the following:

rs.add({ host: "k8s-1.mongo.spron.in", priority: 0, votes: 0} )
rs.add({ host: "k8s-2.mongo.spron.in", priority: 0, votes: 0} )
rs.add({ host: "k8s-3.mongo.spron.in", priority: 0, votes: 0} )

rs.add({ host: "k8s-1.mongo.spron.in", priority: 0, votes: 0} )

rs.add({ host: "k8s-2.mongo.spron.in", priority: 0, votes: 0} )

rs.add({ host: "k8s-3.mongo.spron.in", priority: 0, votes: 0} )

If everything is done correctly these nodes are going to be added as secondaries. You can check the status with the rs.status() command.

Cutover

Check that newly added node are synchronized. The more data you have, the longer the synchronization process is going to take. To understand if nodes are synchronized you should compare the values of optime and optimeDate of the Primary node with the values for the Secondary node in rs.status() output:

{
        "_id" : 0,
        "name" : "147.182.213.59:27017",
        "stateStr" : "PRIMARY",
...
        "optime" : {
                "ts" : Timestamp(1633697030, 1),
                "t" : NumberLong(2)
        },
        "optimeDate" : ISODate("2021-10-08T12:43:50Z"),
...
},
{
        "_id" : 1,
        "name" : "k8s-1.mongo.spron.in:27017",
        "stateStr" : "SECONDARY",
...
        "optime" : {
                "ts" : Timestamp(1633697030, 1),
                "t" : NumberLong(2)
        },
        "optimeDurable" : {
                "ts" : Timestamp(1633697030, 1),
                "t" : NumberLong(2)
        },
        "optimeDate" : ISODate("2021-10-08T12:43:50Z"),
...
},

{

"_id" : 0,

"name" : "147.182.213.59:27017",

"stateStr" : "PRIMARY",

...

"optime" : {

"ts" : Timestamp(1633697030, 1),

"t" : NumberLong(2)

"optimeDate" : ISODate("2021-10-08T12:43:50Z"),

...

{

"_id" : 1,

"name" : "k8s-1.mongo.spron.in:27017",

"stateStr" : "SECONDARY",

...

"optime" : {

"ts" : Timestamp(1633697030, 1),

"t" : NumberLong(2)

"optimeDurable" : {

"ts" : Timestamp(1633697030, 1),

"t" : NumberLong(2)

"optimeDate" : ISODate("2021-10-08T12:43:50Z"),

...

When nodes are synchronized, we are ready to perform the cutover. Please ensure that your application is configured properly to minimize downtime during the cutover.

The cutover is going to have two steps:

One of the nodes on the Target becomes the primary.
Operator starts managing the cluster and nodes on the Source are no longer present in the replica set.

Switch the Primary

Connect with mongo shell to the primary on the Source side and make one of the nodes on the Target primary. It can be done by changing the replica set configuration:

cfg = rs.config()
cfg.members[1].priority = 2
cfg.members[1].votes = 1
rs.reconfig(cfg)

cfg = rs.config()

cfg.members[1].priority = 2

cfg.members[1].votes = 1

rs.reconfig(cfg)

We enable voting and set priority to two on one of the nodes in the Kubernetes cluster. Member id can be different for you, so please look carefully into the output of rs.config() command.

Start Managing the Cluster

Once the primary is running in Kubernetes, we are going to tell the Operator to start managing the cluster. Change spec.unmanaged to false in the Custom Resource with patch command:

$ kubectl patch psmdb my-cluster-name --type=merge -p '{"spec":{"unmanaged": true}}'

1	$ kubectl patch psmdb my-cluster-name --type=merge -p '{"spec":{"unmanaged": true}}'

You can also do this by changing cr.yaml and applying it. This is it, now you have the cluster in Kubernetes which is managed by the Operator.

Conclusion

You truly start to appreciate Operators once you get used to them. When I was writing this blog post I found it extremely annoying to deploy and configure a single MongoDB node on a Linux box; I don’t want to think about the cluster. Operators abstract Kubernetes primitives and database configuration and provide you with a fully operational database service instead of a bunch of nodes. Migration of MongoDB to Kubernetes is a challenging task, but it is much simpler with Operator. And once you are on Kubernetes, Operator takes care of all day-2 operations as well.

We encourage you to try out our operator. See our GitHub repository and check out the documentation.

Found a bug or have a feature idea? Feel free to submit it in JIRA.

For general questions please raise the topic in the community forum.

You are a developer and looking to contribute? Please read our CONTRIBUTING.md and send the Pull Request.

Percona Distribution for MongoDB Operator

The Percona Distribution for MongoDB Operator simplifies running Percona Server for MongoDB on Kubernetes and provides automation for day-1 and day-2 operations. It’s based on the Kubernetes API and enables highly available environments. Regardless of where it is used, the Operator creates a member that is identical to other members created with the same Operator. This provides an assured level of stability to easily build test environments or deploy a repeatable, consistent database environment that meets Percona expert-recommended best practices.

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Migrating MongoDB to Kubernetes

The Goal

Prerequisites

Prepare the Source

Expose

TLS

Create System Users

Prepare the Target

Apply Users and TLS secrets

Deploy the Operator and MongoDB Nodes

Configure Domains

Replicate the Data to the Target

Cutover

Switch the Primary

Start Managing the Cluster

Conclusion

Percona Distribution for MongoDB Operator

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation