Beyond Building Models: Exploring MLOps for Machine Learning Engineers

Beyond Building Models: Exploring MLOps for Machine Learning Engineers

Machine Learning (ML) has emerged as a revolutionary force across varied industries, such as healthcare, banking, e-commerce, etc, in a quickly evolving data-driven world. Your ability to construct robust models is crucial as a machine learning engineer. However, to genuinely excel and stay up with the ever-increasing needs of Machine Learning deployment, you must elevate your game and embrace Machine Learning Operations, popularly known as MLOps or Machine Learning and DevOps principles combined. MLOps enables you to streamline and automate the whole ML lifecycle. It provides a foundation for ML success, from experiment tracking to orchestrating complicated model workflows and smooth deployment to ongoing monitoring and maintenance.


Before you start reading this article, it is preferable to have a basic understanding of the following topics:

  • Python programming language

  • Machine learning

This article introduces MLOps, and I will discuss concepts like experiment tracking, model orchestration, deployment, monitoring, maintenance, and best practices in the MLOps ecosystem. By embracing MLOps, you can unlock a world of possibilities where your ML models seamlessly integrate into production environments and drive tangible business impact.

Experiment Tracking

Experiment tracking is an essential method in MLOps to ensure repeatability and boost collaboration across ML teams. Documenting and monitoring experiments correctly allows machine learning experts to examine and repeat results, allowing for improved decision-making and faster model development. Technologies like MLflow, Neptune, Weights & Biases (WandB), and Comet have gained popularity due to their comprehensive experiment-tracking features. It systematically captures and stores key information about machine learning trials. This information frequently contains the hyperparameters employed, the dataset used, performance metrics, and the experiment results. ML teams can obtain insights into their work, identify the most promising models, and easily replicate past results by correctly recording experiments. Let's explore how MLflow can be leveraged to track experiments in Python:

import mlflow

# Start the MLflow run

with mlflow.start_run():

# Your ML code here


# Log metrics, parameters, and artifacts

mlflow.log_metric("accuracy", accuracy)

mlflow.log_param("learning_rate", learning_rate)


In the code snippet above, we begin by invoking mlflow.start_run(), which initiates the MLflow run and sets the stage for tracking the experiment. You can place your ML code within this block, conducting training, evaluation, and any other relevant operations. To capture essential information about the experiment, MLflow provides several logging functions. In the example, mlflow.log_metric() is used to record the accuracy metric, while mlflow.log_param() captures the value of the learning rate hyperparameter. Additionally, mlflow.log_artifact() enables the logging of artifacts, such as the trained model stored in a "model.pkl" file.

Model Orchestration

Model orchestration is a critical aspect of MLOps that involves managing the workflows and dependencies associated with ML models. Machine learning engineers can effectively schedule and execute complex ML workflows by leveraging popular orchestration tools such as Apache Airflow, Kubeflow, Prefect, etc. Model orchestration ensures that ML models are deployed, monitored, and executed consistently and reliably in production environments. Let's explore an example of model orchestration using Prefect, a workflow orchestration tool for automating the deployment and execution of machine learning models. It includes many complex workflow management capabilities, such as task retries, parameterization, and error handling. Prefect also interfaces effortlessly with various tools and platforms, making it an excellent solution for orchestrating machine learning operations. This allows ML engineers to focus on developing and upgrading models rather than integrating them into production situations:

import prefect
from prefect import task, Flow


def preprocess_data():

# Preprocessing logic


def train_model():

# Model training logic


def evaluate_model():

# Model evaluation logic

# Define the flow

with Flow("ML Workflow") as flow:

data_preprocessing = preprocess_data()

model_training = train_model(data_preprocessing)

model_evaluation = evaluate_model(model_training)

# Run the flow

In the code snippet above, we utilize Prefect to define and execute an ML workflow. The workflow is represented as a Prefect Flow, which encapsulates a series of tasks. Each task represents a specific step in the ML pipeline. We define three tasks: preprocess_data(), train_model(), and evaluate_model(). These tasks encapsulate the respective logic for data preprocessing, model training, and model evaluation. Each task can be a function or a class method, decorated with @task to indicate that it is a Prefect task. To define the dependencies between tasks, we specify them as arguments when calling the subsequent tasks. For example, train_model(data_preprocessing) indicates that the train_model() task depends on the output of the preprocess_data() task. Finally, we create a Prefect Flow with the name "ML Workflow" and add the tasks to the flow. We then execute the flow using, which triggers the execution of the defined tasks in the specified order based on their dependencies.

Model Deployment

Model deployment is a crucial phase in the MLOps journey, encompassing the process of making an ML model accessible and usable by end-users in production environments. This involves packaging the model, deploying it to a production system, and ensuring its availability and accessibility. Model deployment is of major importance in MLOps as it enables users to leverage the power of ML models to make informed decisions. Let's explore different approaches to model deployment.

  • Web Service Deployment: Deploying an ML model as a web service is a popular approach that allows users to interact with the model through a web API. This enables real-time predictions, making it suitable for use cases like recommendation systems, fraud detection, and sentiment analysis. Frameworks like Flask, Django, and FastAPI simplify the process of building and deploying web services. Here's an example of deploying an ML model as a web service using Flask:
from flask import Flask, request

import joblib

app = Flask(__name__)

# Load the trained model

model = joblib.load('model.pkl')

@app.route('/predict', methods=['POST'])

def predict():

# Get the request data

data = request.get_json()

# Perform prediction using the loaded model

prediction = model.predict(data)

return {'prediction': prediction}

if name == '__main__':

In the code snippet above, we use Flask, a popular web framework for Python, to create a web service. The /predict endpoint accepts POST requests with JSON data containing the input for prediction. The model, loaded from the 'model.pkl' file, is used to make predictions, and the result is returned as a JSON response.

  • Batch Job Deployment: Batch job deployment is suitable when predictions can be made offline, and there is no requirement for real-time response. This approach involves running the model on a scheduled basis or triggered by events, processing large volumes of data. Batch jobs are often used in scenarios like data preprocessing, batch recommendation systems, and periodic report generation. Tools like Apache Airflow and Cron can be utilized to schedule and automate batch job deployments.

  • Mobile App Deployment: Deploying ML models in mobile apps allows users to access ML-powered features directly on their mobile devices. This approach is useful for applications like image recognition, natural language processing, and augmented reality. Platforms such as TensorFlow Lite and Core ML provide frameworks and tools to optimize and deploy ML models on mobile devices.


Monitoring is a crucial aspect of MLOps that involves systematically collecting and analysing data related to the performance and behaviour of ML models in production. By monitoring ML models, teams can gain valuable insights into their accuracy, latency, throughput, and overall health. Monitoring plays a vital role in MLOps by enabling teams to identify and address issues proactively, ensuring optimal model performance and user satisfaction.

The primary goals of monitoring ML models are:

  1. Performance Tracking: Monitoring allows teams to track the performance of ML models in production continuously. Metrics such as accuracy, precision, recall, and F1-score can be monitored to ensure that the models deliver the expected results. By closely monitoring these metrics, teams can identify any degradation in performance and take appropriate measures to rectify the issues.

  2. Anomaly Detection: Monitoring helps identify anomalies or unexpected behaviour in ML models. Sudden changes in input data distribution, model outputs, or resource utilization can indicate potential issues. By setting up alerts and thresholds, teams can be notified in real time when anomalies occur, allowing them to investigate and address the underlying causes promptly.

  3. Data Drift Detection: Monitoring can also help detect data drift, which refers to changes in the statistical properties of input data over time. ML models are trained on specific data distributions, and when the distribution shifts, model performance can be affected. By monitoring input data and comparing it to the training data distribution, teams can identify data drift and take actions such as retraining the model on updated data.

There are various tools available for monitoring ML models, each offering unique features and capabilities. Some commonly used tools include:

  • Evidently AI: is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor the performance of ML models from validation to production. It works with tabular and text data and embeddings.

  • Gantry: is a machine learning operations (MLOps) platform that helps businesses build, deploy, and monitor machine learning models. It provides a single source of truth for AI system performance, allowing users to find out how the system is performing and ways to improve it


Maintenance is a crucial aspect of MLOps that keeps ML models up-to-date, accurate, and performing optimally over time. It involves a range of activities, such as retraining models, updating hyperparameters, fixing bugs, and adapting models to changes in data or business requirements. Maintenance is vital to MLOps as it ensures the long-term reliability and effectiveness of ML models in production.

The key tasks involved in ML model maintenance include:

  1. Retraining: ML models may need periodic retraining to adapt to changes in the underlying data distribution or to incorporate new data. Retraining ensures that models remain relevant and accurate in evolving environments. By regularly retraining models using fresh data, machine learning engineers can mitigate the impact of concept drift and maintain high-performance levels.

  2. Hyperparameter Updates: Hyperparameters play a significant role in determining the behaviour and performance of ML models. As new data becomes available or as business requirements change, it may be necessary to update hyperparameters to optimize model performance. Conducting hyperparameter tuning experiments and leveraging techniques like grid search or Bayesian optimization can help identify optimal values.

  3. Bug Fixing: Like any software system, ML models can have bugs that must be addressed. Bugs may arise due to errors in the code, data preprocessing, or feature engineering. It is important to have a robust testing framework in place to identify and fix bugs promptly, minimizing their impact on model performance and user experience.

  4. Feedback Loop: Maintaining an effective feedback loop with stakeholders, end-users, and domain experts is crucial. Gathering insights, feedback, and suggestions from these parties helps identify areas for improvement and refine the model. Feedback can be collected through user surveys, user behaviour analysis, or collaboration with domain experts.

  5. Performance Monitoring: Continuously monitoring the performance of ML models in production is vital. By collecting and analyzing metrics such as accuracy, precision, recall, and F1-score, teams can identify deviations from expected performance and take corrective actions promptly. Monitoring can be done using tools like Prometheus, Grafana, or custom monitoring solutions.

Best Practices in the MLOps World

To ensure the successful implementation of MLOps, it is important to follow best practices that promote efficiency, scalability, reproducibility, and reliability. Embracing a well-defined MLOps approach helps drive collaboration, and continuous improvement, ultimately leading to the successful deployment and operation of ML models in production environments. Here are some key best practices in the MLOps world:

  • Version control: This helps teams track changes to code, data, and models, and it can also help to reproduce experiments and collaborate with others. Data Version Control (DVC) is a version control system specifically designed for data and models in machine learning projects. It provides all the benefits of traditional version control systems like Git while also addressing the unique challenges of managing large datasets and model files. With DVC, teams can roll back to previous versions, experiment with different approaches, and efficiently manage the complexities of ML development.

  • Infrastructure as Code: Adopt the concept of "Infrastructure as Code" to automate the provisioning and management of infrastructure resources required for ML workflows. Tools like Terraform, Microsoft Azure, and AWS Cloud Formation enable the creation of reproducible and scalable infrastructure setups, reducing manual setup and configuration efforts.

  • Continuous Integration and Continuous Delivery (CI/CD): is essential for ML projects. It automates model testing, deployment, and continuous integration, ensuring that new models or updates are seamlessly deployed to production environments. This reduces the risk of errors and enables faster iteration cycles. Continuous Machine Learning (CML) is an example of a CI/CD tool for ML.

  • Experiment Tracking and Documentation: Maintain thorough documentation and tracking of ML experiments, including hyperparameters, data versions, and performance metrics. Tools like MLflow, DVC, Neptune, or Weights & Biases provide capabilities for experiment tracking and reproducibility. Proper documentation and tracking facilitate collaboration, knowledge sharing, and the ability to reproduce experiments.

  • Model Testing: Implement rigorous testing frameworks to validate ML models before deployment. Techniques like unit testing, integration testing, and A/B testing ensure model quality and reliability. Test for edge cases, robustness to different inputs, and performance under different conditions to mitigate risks associated with model inaccuracies or failures.

  • Scalable and Reliable Data Pipelines: Design efficient and scalable data pipelines for data preprocessing, feature engineering, and model training. Tools like Apache Airflow, Apache Beam, or Apache Spark simplify the development and orchestration of data pipelines. Scalable pipelines ensure the efficient processing and transformation of data, while reliability guarantees consistent data quality.

  • Monitoring and Alerting: Implement comprehensive monitoring and alerting systems to track the performance and behaviour of ML models in production. Tools like Prometheus, Grafana, or ELK Stack enable real-time monitoring, anomaly detection, and visualization of key performance indicators. Monitoring helps identify issues, detect anomalies, and trigger appropriate actions for maintaining model performance.

  • Collaboration and Communication: Foster strong collaboration between data scientists, ML engineers, and DevOps teams. Regular communication, feedback loops, and knowledge-sharing sessions promote shared understanding and ensure alignment between stakeholders. Clear communication channels help address challenges effectively and align ML initiatives with business objectives.

  • Security and Privacy: Ensure proper security measures for ML systems, including data privacy, secure storage, and access controls. Implement encryption, data anonymization techniques, and secure authentication mechanisms to protect sensitive data and prevent unauthorized access. Compliance with regulations, such as GDPR or HIPAA, is crucial when handling personal or sensitive information.

  • Continuous Learning and Improvement: Embrace a culture of continuous learning and improvement. Stay updated with the latest advancements in ML technologies, frameworks, and best practices. Regularly evaluate and refine your MLOps processes to incorporate new techniques, tools, and industry standards.


MLOps has emerged as a transformative practice for machine learning engineers to optimize and automate the entire ML lifecycle. Integrating ML and DevOps principles enables efficient experiment tracking, seamless model orchestration, reliable deployment, comprehensive monitoring, proactive maintenance, and adherence to best practices. Embracing MLOps empowers ML teams to streamline workflows, enhance collaboration, ensure reproducibility, and deliver robust and accurate ML solutions in production environments. By following these principles and leveraging the right tools, machine learning engineers can drive innovation, maximize efficiency, and unlock the true potential of machine learning. Embrace MLOps and embark on a journey of continuous improvement and impactful ML deployments.