MLOps Lifecycle

Research work on the notion of AI and MLOps Lifecycle.

MLOps Lifecycle Basics

The emergence of Artificial Intelligence (AI) and Machine Learning (ML) has revolutionised multiple industries, ranging from healthcare and finance to manufacturing and transportation. With the fast growth of data availability and the need for real-time decision-making, Cloud-Edge AI ML Operations (AI MLOps) has become a powerful approach, being able to combine cloud computing, edge devices, and advanced ML algorithms.

The life cycle is composed of three main phases: Design, Model Development and Operations. Each of which is in turn composed of three other phases, making a total of nine steps to complete the total cycle.

The lifecycle of MLOps encompasses the end-to-end management and optimisation of ML models and workflows, integrating DevOps and data science practices. The lifecycle always begins with the problem definition and data collection, where the business goals and relevant data sources are identified. This is followed by data preprocessing, including cleaning, transformation, and feature engineering, to ensure the data is ready for modelling. The next phase involves model development, exploring and evaluating various algorithms and techniques. Once a suitable model is selected, it undergoes training and validation using historical data. After model training, the focus shifts to deployment and monitoring. The model is deployed to production environments where it interacts with real-time data, and monitoring tools are put in place to track its performance, detect anomalies, and ensure reliability.

Tools and Solutions

The following section enlists indicative tools that are suitable for each phase of the AI/MLOps lifecycle. The list is non-exhaustive.

Phase I: Design

The Design Phase encompasses 3 sub-phases:

Requirements Engineering: To identify, analyse, document, and manage the needs and expectations of stakeholders for a software development project
ML Use Cases Prioritisation: The process of identifying and prioritising potential use cases for applying machine learning algorithms in a particular domain or industry
Data Availability Check: To ensure that the necessary data is available and accessible for use in a machine learning project

MLOps Step	Tool Name	Tool Description	Link
Requirements Engineering	JIRA	Widely used project management tool that also includes features for requirements management	https://www.atlassian.com/software/jira
Requirements Engineering	Confluence	This is a wiki-based collaboration tool that can be used for requirements management	https://www.atlassian.com/software/confluence
Requirements Engineering	Visual Paradigm	This is a modelling tool that includes features for requirements engineering, such as the ability to create use cases, user stories, and requirements diagrams	https://www.visual-paradigm.com/
Requirements Engineering	Diagrams.net	Good general purpose Technical Diagram tool to help standardise, between partners, the format of architectural, process diagrams and layouts	https://app.diagrams.net/
ML Use Cases Prioritisation	DataRobot	Automated machine learning platform that includes features for use case prioritisation. It uses a combination of artificial intelligence and human expertise to identify and prioritise potential use cases	https://www.datarobot.com/
ML Use Cases Prioritisation	Microsoft Azure ML Studio	It includes tools for identifying and prioritising use cases	https://azure.microsoft.com/en-us/products/machine-learning/
ML Use Cases Prioritisation	Rapid Miner	It includes tools for evaluating and selecting the most promising use cases for implementation	https://rapidminer.com/
Data Availability Check	SageMaker Data Wrangler	A tool that provides a user-friendly interface for data cleaning and preparation	https://aws.amazon.com/sagemaker/data-wrangler
Data Availability Check	Trifacta	Data preparation platform that provides features for data profiling, cleaning, and enrichment	https://www.trifacta.com
Data Availability Check	Azure Data Factory	Data integration platform that provides features for data preparation, transformation, and validation	https://azure.microsoft.com/en-us/products/data-factory
Data Availability Check	Industreweb	Large library of Protocol shopfloor edge connectivity, can be used to verify if a suitable protocol connector is available, and if not then add a new type to support the pilot requirements	https://www.industreweb.co.uk/

Phase II: Model Development

The Model Development Phase has the following 3 sub-phases:

Data Engineering: To design, build, and maintain the infrastructure required to support the collection, storage, processing, and analysis
ML Model Engineering: To design and develop machine learning models that can perform complex tasks with high accuracy and reliability
Model Testing and Validation: To evaluate the performance and accuracy of a machine learning model and to ensure that it can generalise well to new data

MLOps Step	Tool Name	Tool Description	Link
Data Engineering	Apache Spark	Open-source data processing engine that provides features for batch processing, streaming, and machine learning	https://spark.apache.org/
Data Engineering	Apache Kafka	Distributed streaming platform that provides features for data processing and messaging	https://kafka.apache.org/
Data Engineering	Apache Hadoop	Open-source framework that provides features for distributed storage and processing of large data sets	https://hadoop.apache.org/
Data Engineering	Industreweb	Edge Router with connectivity of vast majority of shopfloor protocols and near real-time processing of data for the provision of data to ML components	https://www.industreweb.co.uk/
ML Model Engineering	Google Cloud AutoML	Tools for automated ML (AutoML), including AutoML Vision, AutoML Video Intelligence, AutoML Natural Language, and AutoML Translation	https://cloud.google.com/automl
ML Model Engineering	H2o.ai	Open-source platform that provides several tools for AutoML	https://h2o.ai/
ML Model Engineering	TPOT	Open-source AutoML tool that automates the process of building and optimising machine learning pipelines	http://automl.info/
Model Testing and Validation	Scikit-learn	Python library for machine learning that provides a wide range of tools for model selection, evaluation, and validation	https://scikit-learn.org/
Model Testing and Validation	Keras	Deep learning API that provides a range of tools for model training, evaluation, and validation	https://keras.io/
Model Testing and Validation	PyTorch	Open-source machine learning framework that provides a range of tools for model training, evaluation, and validation	https://pytorch.org/

Phase III: Operations

The Operations Phase is the final phase of the MLOps Lifecycle and is comprised of 3 sub-phases:

ML Model Deployment: To integrate a trained machine learning model into a production environment so that it can be used to make predictions or decisions based on new data
CI/CD Pipelines: The purpose of continuous integration and continuous deployment (CI/CD) pipelines is to automate the machine learning model development and deployment process
Monitoring and Triggering: To continuously monitor the performance of machine learning models deployed in production environments and trigger actions when necessary

MLOps Step	Tool Name	Tool Description	Link
ML Model Deployment	Kubeflow	Open-source platform for deploying and managing machine learning workflows on Kubernetes	https://www.kubeflow.org
ML Model Deployment	Docker	Platform for building, packaging, and deploying applications in containers	https://www.docker.com/
ML Model Deployment	Microsoft Azure Machine Learning	Cloud-based platform for building, training, and deploying machine learning models	https://azure.microsoft.com/en-us/products/machine-learning/
CI/CD Pipelines	Jenkins	Open-source automation server that provides a wide range of features	https://www.jenkins.io/
CI/CD Pipelines	GitLab CI/CD	Platform for continuous integration and continuous deployment that provides features for building, testing, and deploying machine learning models	https://docs.gitlab.com/ee/ci/
CI/CD Pipelines	CircleCI	Cloud-based continuous integration and continuous deployment platform	https://circleci.com
Monitoring and Triggering	Prometheus	Open-source monitoring system that provides features for monitoring and alerting on various aspects of the machine learning pipeline	https://prometheus.io/
Monitoring and Triggering	Grafana	Open-source platform for data visualisation and monitoring that can be used to create dashboards and alerts for monitoring machine learning models	https://grafana.com/
Monitoring and Triggering	Kibana	Open-source platform for data visualisation and analysis that can be used to monitor and analyse the performance of machine learning models	https://www.elastic.co/es/kibana/
Monitoring and Triggering	Industreweb	Industreweb Collect Engine allows for detection of events and the triggering of mitigating actions. Industreweb Display dashboards allow for screens to be created to reflect ML status	https://www.industreweb.co.uk/

Other Info

Research by Software Competence Centre Hagenberg - SCCH

Relevant Categories

notifications

user-interface-preferences

Search

Navigation