Migrating MLFlow Server To Cloud: Part 2

Table of contents

Reading Time: 4 minutes

In my previous blog, I had discussed the first two phases of migrating MLFlow server to cloud. In this blog, I’ll be discussing the deployment of MLflow tracking server on Google Cloud Platform and migration of the existing data to the process. Also, I’ll be talking about optimizing the overall environment in the process.

Deployment

Step 1: Copy Contents from Disk

Go to this link and click on Create Bucket.
Give a name for your storage bucket, fill in other details and finally click on Create.
Now go to the bucket created and create a directory named mlruns.
Go to the directory and click on Upload Folder option.
Import the mlruns folder from the local disk.

Step 2: Migrate the Postgres Data

Create an SQL dump using the pg_dump in the local server. The following command does this part.

sudo -u postgres pg_dump --no-acl --no-owner --format=plain -d postgres_mlflow > database_dump.sql

Now, go to the bucket that we created in the previous step and upload the database_dump.sql file to the bucket.
Now go to this link and click on Create Instance. A screen will open asking for the type of database.

Click on the PostgreSQL. And, select the version of the database.
After the database has been created, go to the instance and Click on the Import Data. A screen will open asking the parameters for import.

Click on Browse and find database_dump.sql in the bucket. Click on the Import at the bottom and it’ll repopulate the database in the cloud.

Step 3: Setup & Configure MLFlow Server

Go to this link and create a VM instance. SSH into the VM and run the following commands.

sudo apt update
sudo apt upgrade -y
sudo apt install -y python3.6 python3-pip libpq-dev postgresql-client
sudo pip3 install mlflow psycopg2

Check the installation of MLFlow using mlflow –version command.
Go to the cloud shell and create a Network tag and assign it to the instance that you created.

gcloud compute instances add-tags <instance-name> --tags=<tag-name> --zone=us-central1-a

Create a new firewall rule to allow outside access to the UI.

gcloud compute firewall-rules create --direction=INGRESS --priority=999 --network=default --action=ALLOW --rules=tcp:5000 --source-ranges=0.0.0.0/0 --target-tags=<tag-name>

Go to the SQL instance and click on the Edit Instance.
Click on Add Network. Use the public IP of your VM instance to allow the VM to access this database.

Now, all that is left is to run the MLFlow server. SSH into the VM and run the following command.

mlflow server --backend-store-uri postgresql://<user>:<password>@<database-ip>:5432/<database-name> --default-artifact-root gs://<bucket-name>/mlruns --host <internal-ip>

We have successfully configured the MLFlow in the cloud. But, how do we know that it works? More importantly, has all the data been successfully migrated in the cloud? So, we will run a sample experiment in the new machine in the cloud. The following was the UI that opened in the localhost before migration.

Now, we run a sample experiment on cloud and check if all the data of previous experiments is available or not. So, here’s the UI of the MLFlow server in the cloud. It clearly shows that all experiments are there and it is also tracking the new experiment that had just been executed, which means that our migration was successful.

Optimization

In the optimization phase, you refine your environment to make it more efficient than your initial deployment. It is useful if you’re planning to optimize an existing environment after migrating to Cloud, or if you’re evaluating the opportunity to optimize and want to explore what it might look like. Optimization is an ongoing and continuous task. You constantly optimize your environment as it evolves. Before starting any optimization, you need to evaluate your environment. Here, we have narrowed down to two existing problems that we’ll focus on later on.

Problem 1

The first problem that we have identified is that the whole setup and configuration is a manual process. If this process is a one time job then it’s fine but looking at the larger scenario, this was just a testing environment. In the future, we would likely to have a few more environments. Doing everything manually is a very tedious and time-consuming job. Thus, we would like to automate this process in the future.

Problem 2

The second problem is the need for monitoring. We do not have any monitoring mechanism in this environment. A comprehensive monitoring system is a necessary component for the environment. The system monitors all essential metrics that you need to evaluate for your optimization goals. We can even optimize the costs of the cloud if we have a proper monitoring system in place. We’ll also be aware of the problems that may exist in our system much faster. Therefore, we would like to add a monitoring system for the environment later on.

Conclusion

Choosing a migration strategy is easy. Designing and implementing it is the challenge. The larger or more complex your current infrastructure, the harder these challenges are to overcome. Thus, a cloud migration requires careful analysis, planning and execution to ensure the cloud solution’s compatibility with organizational requirements.