Machine Learning Operations At Scale (part 1)

6 minute read

Published:

Introduction to MLOPs

Hello once again and welcome to my blog where I write about technology things I found impressive overtime.

I am currently going through an impressive stage following along with the current wave of Machine learning Operations (MLOps). For those new to the space I will do a quick definition of what MLOps is.

In non-technical term think MLOPs as the process of deploying tens to thousands of several machine learning models for an organization or for a problem. Now, if you are an organization whose machine learning models are getting beyond a cap of say 10 models running in production, then there is the chance that you will need an MLOps practice to keep track as well as manage the models.

For official definition of MLOps - A perfect one will be the definition given by Wikipedia

MLOps or ML Ops is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of “machine learning” and the continuous development practice of DevOps in the software field

The two main points here is deployment and maintainance, although I didn’t cater for maintainance in my non-technical definition, alas, I will say for any software that is deployed, rarely would you see it not maintained, hence the culture of maintanance is logically infused into MLOps per it being entirely as process of deployment

Traditional Machine learning approach vs DataOps infused MLOps.

Wondering what new term we are adding to MLOPs, well you got it DataOps, it is just as similar to MLOPs and usually I prefer to think it same, it is a process of managing data deployments for machine learning.

We all know machine learning models are rarely a thing without the data, however, for easy deployment and maintainance of models, organisations needed a way to manage the data as well as the features on which the models are trained on, giving rise to the need for DataOps. I will largely not delve into dataops as a thing on its own, I will take it up as part of the MLOps pipeline. Some side note, it is often safe and potentially un-nerving to try and seperate the feature engineering management and model deployment of a model. You can refer to this article on an intuition of doing that perfectly.

Traditional machine learning have data scientists (possibly) downloading data from CSV or some data engineering dumping cleaned data to a database table, then the data scientist opens up a notebook and probably run several iterations of data exploration, feature engineering and model building. At the end of that cycle, oftentimes, the data scientist need to trace back to see which of the steps actually gave a good model, thanks to the concept of Pipeline as present in sklean and spark-ml, we can chain that process and effectively track and manage changes. Nevertheless beyond model building is using the model in production, and that’s where the challenging part of the work comes to play. We need to build probably a flask application on our saved model, also ensure that the feature engineering code is right for the incoming data and we can possibly create a docker environment and host our service as a microservice, or maybe deploy to a local environment - refer to a my article on model deployment with c++ for example. But here is where things get tricky, because a data scientist is not entirely proficient with the skills of managing applications in production, then the likelikhood of not properly guarding against the following question is high:

  • how are we going to know when the model starts degrading
  • how do we know whether we are making good prediction on data or not
  • how often should we retrain the model
  • how to move from one model deployed to another project while tracking the old projects performance.

This leads us to the space of MLOPs, where data scientists are basically left to experiments with their models and possibly create a good machine learning script which the machine learning engineer or operations person can literally scale with adequate monitoring and retraining process, leaving the data scientist to do what he does best, which is exploration of data and building models that works for several machine learning task.

The place of CI/CD in MLOps

Beyond being able to take a single code and deploying it, there is also the process of testing, continous integration and automatic deployment. While this process are largely still evolving for machine learning operations, there are few tricks that tends to work out of the box for now.

Continous Integration for machine learning codes is actually quite tricky because the heartbeat of continous integration is actually code test to ensure that upon adding new stuffs to the code, nothing is broken. However, machine learning tests are hard, because it is usually unclear what to test.

Nevetherless, we will see in this series of articles, some idea to code testing for machine learning operations.

Toolings.

I think the most important tool for MLOPs is usually the orchestration engine that manages the pipeline, hence this is the reason why tools such as Airflow, Kubeflow and MLFlow dominates the tooling landscape, also there various aspect of the machine learning pipeline has its specific tools which is still growing. There is github awesome repo for MLOPs that is what looking at.

In this series of article, I will be building a sample production application bringing the various aspect of MLOPs with some of the tools in used at a high scale production level.

PS: By all means you can decide to scale down or scale up on the processes involved here depending on the business case or organization capacity

Conclusion

This is a basic introduction into a series of articles that covers MLOPs and how to do it.

Our first article will look at setting up a Kubeflow environment for our machine learning operations project.

If your organization is intending to scale your machine learning pipeline, or having difficulty taking machine learning models into production properly, you can email me at adekunleba@gmail.com for some guidance, i will be more than willing to listen and provide information that most likely will be of help.

Leave a Comment