MLOps Lifecycle
Research work on the notion of AI and MLOps Lifecycle.

MLOps Lifecycle Basics
The emergence of Artificial Intelligence (AI) and Machine Learning (ML) has revolutionised multiple industries, ranging from healthcare and finance to manufacturing and transportation. With the fast growth of data availability and the need for real-time decision-making, Cloud-Edge AI ML Operations (AI MLOps) has become a powerful approach, being able to combine cloud computing, edge devices, and advanced ML algorithms.
The life cycle is composed of three main phases: Design, Model Development and Operations. Each of which is in turn composed of three other phases, making a total of nine steps to complete the total cycle.
The lifecycle of MLOps encompasses the end-to-end management and optimisation of ML models and workflows, integrating DevOps and data science practices. The lifecycle always begins with the problem definition and data collection, where the business goals and relevant data sources are identified. This is followed by data preprocessing, including cleaning, transformation, and feature engineering, to ensure the data is ready for modelling. The next phase involves model development, exploring and evaluating various algorithms and techniques. Once a suitable model is selected, it undergoes training and validation using historical data. After model training, the focus shifts to deployment and monitoring. The model is deployed to production environments where it interacts with real-time data, and monitoring tools are put in place to track its performance, detect anomalies, and ensure reliability.
Tools and Solutions
The following section enlists indicative tools that are suitable for each phase of the AI/MLOps lifecycle. The list is non-exhaustive.
Phase I: Design
The Design Phase encompasses 3 sub-phases:
- Requirements Engineering: To identify, analyse, document, and manage the needs and expectations of stakeholders for a software development project
- ML Use Cases Prioritisation: The process of identifying and prioritising potential use cases for applying machine learning algorithms in a particular domain or industry
- Data Availability Check: To ensure that the necessary data is available and accessible for use in a machine learning project
| MLOps Step | Tool Name | Tool Description | Link |
|---|---|---|---|
| Requirements Engineering | JIRA | Widely used project management tool that also includes features for requirements management | https://www.atlassian.com/software/jira |
| Requirements Engineering | Confluence | This is a wiki-based collaboration tool that can be used for requirements management | https://www.atlassian.com/software/confluence |
| Requirements Engineering | Visual Paradigm | This is a modelling tool that includes features for requirements engineering, such as the ability to create use cases, user stories, and requirements diagrams | https://www.visual-paradigm.com/ |
| Requirements Engineering | Diagrams.net | Good general purpose Technical Diagram tool to help standardise, between partners, the format of architectural, process diagrams and layouts | https://app.diagrams.net/ |
| ML Use Cases Prioritisation | DataRobot | Automated machine learning platform that includes features for use case prioritisation. It uses a combination of artificial intelligence and human expertise to identify and prioritise potential use cases | https://www.datarobot.com/ |
| ML Use Cases Prioritisation | Microsoft Azure ML Studio | It includes tools for identifying and prioritising use cases | https://azure.microsoft.com/en-us/products/machine-learning/ |
| ML Use Cases Prioritisation | Rapid Miner | It includes tools for evaluating and selecting the most promising use cases for implementation | https://rapidminer.com/ |
| Data Availability Check | SageMaker Data Wrangler | A tool that provides a user-friendly interface for data cleaning and preparation | https://aws.amazon.com/sagemaker/data-wrangler |
| Data Availability Check | Trifacta | Data preparation platform that provides features for data profiling, cleaning, and enrichment | https://www.trifacta.com |
| Data Availability Check | Azure Data Factory | Data integration platform that provides features for data preparation, transformation, and validation | https://azure.microsoft.com/en-us/products/data-factory |
| Data Availability Check | Industreweb | Large library of Protocol shopfloor edge connectivity, can be used to verify if a suitable protocol connector is available, and if not then add a new type to support the pilot requirements | https://www.industreweb.co.uk/ |
Phase II: Model Development
The Model Development Phase has the following 3 sub-phases:
- Data Engineering: To design, build, and maintain the infrastructure required to support the collection, storage, processing, and analysis
- ML Model Engineering: To design and develop machine learning models that can perform complex tasks with high accuracy and reliability
- Model Testing and Validation: To evaluate the performance and accuracy of a machine learning model and to ensure that it can generalise well to new data
| MLOps Step | Tool Name | Tool Description | Link |
|---|---|---|---|
| Data Engineering | Apache Spark | Open-source data processing engine that provides features for batch processing, streaming, and machine learning | https://spark.apache.org/ |
| Data Engineering | Apache Kafka | Distributed streaming platform that provides features for data processing and messaging | https://kafka.apache.org/ |
| Data Engineering | Apache Hadoop | Open-source framework that provides features for distributed storage and processing of large data sets | https://hadoop.apache.org/ |
| Data Engineering | Industreweb | Edge Router with connectivity of vast majority of shopfloor protocols and near real-time processing of data for the provision of data to ML components | https://www.industreweb.co.uk/ |
| ML Model Engineering | Google Cloud AutoML | Tools for automated ML (AutoML), including AutoML Vision, AutoML Video Intelligence, AutoML Natural Language, and AutoML Translation | https://cloud.google.com/automl |
| ML Model Engineering | H2o.ai | Open-source platform that provides several tools for AutoML | https://h2o.ai/ |
| ML Model Engineering | TPOT | Open-source AutoML tool that automates the process of building and optimising machine learning pipelines | http://automl.info/ |
| Model Testing and Validation | Scikit-learn | Python library for machine learning that provides a wide range of tools for model selection, evaluation, and validation | https://scikit-learn.org/ |
| Model Testing and Validation | Keras | Deep learning API that provides a range of tools for model training, evaluation, and validation | https://keras.io/ |
| Model Testing and Validation | PyTorch | Open-source machine learning framework that provides a range of tools for model training, evaluation, and validation | https://pytorch.org/ |
Phase III: Operations
The Operations Phase is the final phase of the MLOps Lifecycle and is comprised of 3 sub-phases:
- ML Model Deployment: To integrate a trained machine learning model into a production environment so that it can be used to make predictions or decisions based on new data
- CI/CD Pipelines: The purpose of continuous integration and continuous deployment (CI/CD) pipelines is to automate the machine learning model development and deployment process
- Monitoring and Triggering: To continuously monitor the performance of machine learning models deployed in production environments and trigger actions when necessary
| MLOps Step | Tool Name | Tool Description | Link |
|---|---|---|---|
| ML Model Deployment | Kubeflow | Open-source platform for deploying and managing machine learning workflows on Kubernetes | https://www.kubeflow.org |
| ML Model Deployment | Docker | Platform for building, packaging, and deploying applications in containers | https://www.docker.com/ |
| ML Model Deployment | Microsoft Azure Machine Learning | Cloud-based platform for building, training, and deploying machine learning models | https://azure.microsoft.com/en-us/products/machine-learning/ |
| CI/CD Pipelines | Jenkins | Open-source automation server that provides a wide range of features | https://www.jenkins.io/ |
| CI/CD Pipelines | GitLab CI/CD | Platform for continuous integration and continuous deployment that provides features for building, testing, and deploying machine learning models | https://docs.gitlab.com/ee/ci/ |
| CI/CD Pipelines | CircleCI | Cloud-based continuous integration and continuous deployment platform | https://circleci.com |
| Monitoring and Triggering | Prometheus | Open-source monitoring system that provides features for monitoring and alerting on various aspects of the machine learning pipeline | https://prometheus.io/ |
| Monitoring and Triggering | Grafana | Open-source platform for data visualisation and monitoring that can be used to create dashboards and alerts for monitoring machine learning models | https://grafana.com/ |
| Monitoring and Triggering | Kibana | Open-source platform for data visualisation and analysis that can be used to monitor and analyse the performance of machine learning models | https://www.elastic.co/es/kibana/ |
| Monitoring and Triggering | Industreweb | Industreweb Collect Engine allows for detection of events and the triggering of mitigating actions. Industreweb Display dashboards allow for screens to be created to reflect ML status | https://www.industreweb.co.uk/ |
Other Info
- Research by Software Competence Centre Hagenberg - SCCH