Jump to content

MLOps Lifecycle

Research work on the notion of AI and MLOps Lifecycle.

The MLOps Lifecycle

MLOps Lifecycle Basics

The emergence of Artificial Intelligence (AI) and Machine Learning (ML) has revolutionised multiple industries, ranging from healthcare and finance to manufacturing and transportation. With the fast growth of data availability and the need for real-time decision-making, Cloud-Edge AI ML Operations (AI MLOps) has become a powerful approach, being able to combine cloud computing, edge devices, and advanced ML algorithms.

The life cycle is composed of three main phases: Design, Model Development and Operations. Each of which is in turn composed of three other phases, making a total of nine steps to complete the total cycle.

The lifecycle of MLOps encompasses the end-to-end management and optimisation of ML models and workflows, integrating DevOps and data science practices. The lifecycle always begins with the problem definition and data collection, where the business goals and relevant data sources are identified. This is followed by data preprocessing, including cleaning, transformation, and feature engineering, to ensure the data is ready for modelling. The next phase involves model development, exploring and evaluating various algorithms and techniques. Once a suitable model is selected, it undergoes training and validation using historical data. After model training, the focus shifts to deployment and monitoring. The model is deployed to production environments where it interacts with real-time data, and monitoring tools are put in place to track its performance, detect anomalies, and ensure reliability.

Tools and Solutions

The following section enlists indicative tools that are suitable for each phase of the AI/MLOps lifecycle. The list is non-exhaustive.

Phase I: Design

The Design Phase encompasses 3 sub-phases:

  • Requirements Engineering: To identify, analyse, document, and manage the needs and expectations of stakeholders for a software development project
  • ML Use Cases Prioritisation: The process of identifying and prioritising potential use cases for applying machine learning algorithms in a particular domain or industry
  • Data Availability Check: To ensure that the necessary data is available and accessible for use in a machine learning project


MLOps Step Tool Name Tool Description Link
Requirements Engineering JIRA Widely used project management tool that also includes features for requirements management https://www.atlassian.com/software/jira
Requirements Engineering Confluence This is a wiki-based collaboration tool that can be used for requirements management https://www.atlassian.com/software/confluence
Requirements Engineering Visual Paradigm This is a modelling tool that includes features for requirements engineering, such as the ability to create use cases, user stories, and requirements diagrams https://www.visual-paradigm.com/
Requirements Engineering Diagrams.net Good general purpose Technical Diagram tool to help standardise, between partners, the format of architectural, process diagrams and layouts https://app.diagrams.net/
ML Use Cases Prioritisation DataRobot Automated machine learning platform that includes features for use case prioritisation. It uses a combination of artificial intelligence and human expertise to identify and prioritise potential use cases https://www.datarobot.com/
ML Use Cases Prioritisation Microsoft Azure ML Studio It includes tools for identifying and prioritising use cases https://azure.microsoft.com/en-us/products/machine-learning/
ML Use Cases Prioritisation Rapid Miner It includes tools for evaluating and selecting the most promising use cases for implementation https://rapidminer.com/
Data Availability Check SageMaker Data Wrangler A tool that provides a user-friendly interface for data cleaning and preparation https://aws.amazon.com/sagemaker/data-wrangler
Data Availability Check Trifacta Data preparation platform that provides features for data profiling, cleaning, and enrichment https://www.trifacta.com
Data Availability Check Azure Data Factory Data integration platform that provides features for data preparation, transformation, and validation https://azure.microsoft.com/en-us/products/data-factory
Data Availability Check Industreweb Large library of Protocol shopfloor edge connectivity, can be used to verify if a suitable protocol connector is available, and if not then add a new type to support the pilot requirements https://www.industreweb.co.uk/

Phase II: Model Development

The Model Development Phase has the following 3 sub-phases:

  • Data Engineering: To design, build, and maintain the infrastructure required to support the collection, storage, processing, and analysis
  • ML Model Engineering: To design and develop machine learning models that can perform complex tasks with high accuracy and reliability
  • Model Testing and Validation: To evaluate the performance and accuracy of a machine learning model and to ensure that it can generalise well to new data


MLOps Step Tool Name Tool Description Link
Data Engineering Apache Spark Open-source data processing engine that provides features for batch processing, streaming, and machine learning https://spark.apache.org/
Data Engineering Apache Kafka Distributed streaming platform that provides features for data processing and messaging https://kafka.apache.org/
Data Engineering Apache Hadoop Open-source framework that provides features for distributed storage and processing of large data sets https://hadoop.apache.org/
Data Engineering Industreweb Edge Router with connectivity of vast majority of shopfloor protocols and near real-time processing of data for the provision of data to ML components https://www.industreweb.co.uk/
ML Model Engineering Google Cloud AutoML Tools for automated ML (AutoML), including AutoML Vision, AutoML Video Intelligence, AutoML Natural Language, and AutoML Translation https://cloud.google.com/automl
ML Model Engineering H2o.ai Open-source platform that provides several tools for AutoML https://h2o.ai/
ML Model Engineering TPOT Open-source AutoML tool that automates the process of building and optimising machine learning pipelines http://automl.info/
Model Testing and Validation Scikit-learn Python library for machine learning that provides a wide range of tools for model selection, evaluation, and validation https://scikit-learn.org/
Model Testing and Validation Keras Deep learning API that provides a range of tools for model training, evaluation, and validation https://keras.io/
Model Testing and Validation PyTorch Open-source machine learning framework that provides a range of tools for model training, evaluation, and validation https://pytorch.org/

Phase III: Operations

The Operations Phase is the final phase of the MLOps Lifecycle and is comprised of 3 sub-phases:

  • ML Model Deployment: To integrate a trained machine learning model into a production environment so that it can be used to make predictions or decisions based on new data
  • CI/CD Pipelines: The purpose of continuous integration and continuous deployment (CI/CD) pipelines is to automate the machine learning model development and deployment process
  • Monitoring and Triggering: To continuously monitor the performance of machine learning models deployed in production environments and trigger actions when necessary


MLOps Step Tool Name Tool Description Link
ML Model Deployment Kubeflow Open-source platform for deploying and managing machine learning workflows on Kubernetes https://www.kubeflow.org
ML Model Deployment Docker Platform for building, packaging, and deploying applications in containers https://www.docker.com/
ML Model Deployment Microsoft Azure Machine Learning Cloud-based platform for building, training, and deploying machine learning models https://azure.microsoft.com/en-us/products/machine-learning/
CI/CD Pipelines Jenkins Open-source automation server that provides a wide range of features https://www.jenkins.io/
CI/CD Pipelines GitLab CI/CD Platform for continuous integration and continuous deployment that provides features for building, testing, and deploying machine learning models https://docs.gitlab.com/ee/ci/
CI/CD Pipelines CircleCI Cloud-based continuous integration and continuous deployment platform https://circleci.com
Monitoring and Triggering Prometheus Open-source monitoring system that provides features for monitoring and alerting on various aspects of the machine learning pipeline https://prometheus.io/
Monitoring and Triggering Grafana Open-source platform for data visualisation and monitoring that can be used to create dashboards and alerts for monitoring machine learning models https://grafana.com/
Monitoring and Triggering Kibana Open-source platform for data visualisation and analysis that can be used to monitor and analyse the performance of machine learning models https://www.elastic.co/es/kibana/
Monitoring and Triggering Industreweb Industreweb Collect Engine allows for detection of events and the triggering of mitigating actions. Industreweb Display dashboards allow for screens to be created to reflect ML status https://www.industreweb.co.uk/

Other Info

Relevant Categories