MLOps Lifecycle: Difference between revisions
| Line 15: | Line 15: | ||
== Tools and Solutions== | == Tools and Solutions== | ||
=== Phase I: Design=== | === Phase I: Design=== | ||
The Design Phase encompasses 3 sub-phases: | |||
*Requirements Engineering: To identify, analyse, document, and manage the needs and expectations of stakeholders for a software development project | |||
*ML Use Cases Prioritisation: The process of identifying and prioritising potential use cases for applying machine learning algorithms in a particular domain or industry | |||
*Data Availability Check: To ensure that the necessary data is available and accessible for use in a machine learning project | |||
{| class="wikitable" style="margin:auto; width:100%" | {| class="wikitable" style="margin:auto; width:100%" | ||
|- | |- | ||
| Line 27: | Line 31: | ||
| Requirements Engineering|| Diagrams.net|| Good general purpose Technical Diagram tool to help standardise, between partners, the format of architectural, process diagrams and layouts|| https://app.diagrams.net/ | | Requirements Engineering|| Diagrams.net|| Good general purpose Technical Diagram tool to help standardise, between partners, the format of architectural, process diagrams and layouts|| https://app.diagrams.net/ | ||
|- | |- | ||
| ML Use | | ML Use Cases Prioritisation|| DataRobot|| Automated machine learning platform that includes features for use case prioritisation. It uses a combination of artificial intelligence and human expertise to identify and prioritise potential use cases|| https://www.datarobot.com/ | ||
|- | |- | ||
| ML Use | | ML Use Cases Prioritisation|| Microsoft Azure ML Studio|| It includes tools for identifying and prioritising use cases|| https://azure.microsoft.com/en-us/products/machine-learning/ | ||
|- | |- | ||
| ML Use | | ML Use Cases Prioritisation|| Rapid Miner|| It includes tools for evaluating and selecting the most promising use cases for implementation|| https://rapidminer.com/ | ||
|- | |- | ||
| Data Availability Check|| SageMaker Data Wrangler|| A tool that provides a user-friendly interface for data cleaning and preparation|| https://aws.amazon.com/sagemaker/data-wrangler | | Data Availability Check|| SageMaker Data Wrangler|| A tool that provides a user-friendly interface for data cleaning and preparation|| https://aws.amazon.com/sagemaker/data-wrangler | ||
Revision as of 10:00, 10 November 2023
Research work on the notion of AI and MLOps Lifecycle.

MLOps Lifecycle Basics
The emergence of Artificial Intelligence (AI) and Machine Learning (ML) has revolutionised multiple industries, ranging from healthcare and finance to manufacturing and transportation. With the fast growth of data availability and the need for real-time decision-making, Cloud-Edge AI ML Operations (AI MLOps) has become a powerful approach, being able to combine cloud computing, edge devices, and advanced ML algorithms.
The life cycle is composed of three main phases: Design, Model Development and Operations. Each of which is in turn composed of three other phases, making a total of nine steps to complete the total cycle.
The lifecycle of MLOps encompasses the end-to-end management and optimisation of ML models and workflows, integrating DevOps and data science practices. The lifecycle always begins with the problem definition and data collection, where the business goals and relevant data sources are identified. This is followed by data preprocessing, including cleaning, transformation, and feature engineering, to ensure the data is ready for modelling. The next phase involves model development, exploring and evaluating various algorithms and techniques. Once a suitable model is selected, it undergoes training and validation using historical data. After model training, the focus shifts to deployment and monitoring. The model is deployed to production environments where it interacts with real-time data, and monitoring tools are put in place to track its performance, detect anomalies, and ensure reliability.
Tools and Solutions
Phase I: Design
The Design Phase encompasses 3 sub-phases:
- Requirements Engineering: To identify, analyse, document, and manage the needs and expectations of stakeholders for a software development project
- ML Use Cases Prioritisation: The process of identifying and prioritising potential use cases for applying machine learning algorithms in a particular domain or industry
- Data Availability Check: To ensure that the necessary data is available and accessible for use in a machine learning project
| MLOps Step | Tool Name | Tool Description | Link |
|---|---|---|---|
| Requirements Engineering | JIRA | Widely used project management tool that also includes features for requirements management | https://www.atlassian.com/software/jira |
| Requirements Engineering | Confluence | This is a wiki-based collaboration tool that can be used for requirements management | https://www.atlassian.com/software/confluence |
| Requirements Engineering | Visual Paradigm | This is a modelling tool that includes features for requirements engineering, such as the ability to create use cases, user stories, and requirements diagrams | https://www.visual-paradigm.com/ |
| Requirements Engineering | Diagrams.net | Good general purpose Technical Diagram tool to help standardise, between partners, the format of architectural, process diagrams and layouts | https://app.diagrams.net/ |
| ML Use Cases Prioritisation | DataRobot | Automated machine learning platform that includes features for use case prioritisation. It uses a combination of artificial intelligence and human expertise to identify and prioritise potential use cases | https://www.datarobot.com/ |
| ML Use Cases Prioritisation | Microsoft Azure ML Studio | It includes tools for identifying and prioritising use cases | https://azure.microsoft.com/en-us/products/machine-learning/ |
| ML Use Cases Prioritisation | Rapid Miner | It includes tools for evaluating and selecting the most promising use cases for implementation | https://rapidminer.com/ |
| Data Availability Check | SageMaker Data Wrangler | A tool that provides a user-friendly interface for data cleaning and preparation | https://aws.amazon.com/sagemaker/data-wrangler |
| Data Availability Check | Trifacta | Data preparation platform that provides features for data profiling, cleaning, and enrichment | https://www.trifacta.com |
| Data Availability Check | Azure Data Factory | Data integration platform that provides features for data preparation, transformation, and validation | https://azure.microsoft.com/en-us/products/data-factory |
| Data Availability Check | Industreweb | Large library of Protocol shopfloor edge connectivity, can be used to verify if a suitable protocol connector is available, and if not then add a new type to support the pilot requirements | https://www.industreweb.co.uk/ |
Phase II: Model Development
| MLOps Step | Tool Name | Tool Description | Link |
|---|---|---|---|
| Data Engineering | Apache Spark | Open-source data processing engine that provides features for batch processing, streaming, and machine learning | https://spark.apache.org/ |
| Data Engineering | Apache Kafka | Distributed streaming platform that provides features for data processing and messaging | https://kafka.apache.org/ |
| Data Engineering | Apache Hadoop | Open-source framework that provides features for distributed storage and processing of large data sets | https://hadoop.apache.org/ |
| Data Engineering | Industreweb | Edge Router with connectivity of vast majority of shopfloor protocols and near real-time processing of data for the provision of data to ML components | https://www.industreweb.co.uk/ |
| ML Model Engineering | Google Cloud AutoML | Tools for automated ML (AutoML), including AutoML Vision, AutoML Video Intelligence, AutoML Natural Language, and AutoML Translation | https://cloud.google.com/automl |
| ML Model Engineering | H2o.ai | Open-source platform that provides several tools for AutoML | https://h2o.ai/ |
| ML Model Engineering | TPOT | Open-source AutoML tool that automates the process of building and optimising machine learning pipelines | http://automl.info/ |
| Model Testing and Validation | Scikit-learn | Python library for machine learning that provides a wide range of tools for model selection, evaluation, and validation | https://scikit-learn.org/ |
| Model Testing and Validation | Keras | Deep learning API that provides a range of tools for model training, evaluation, and validation | https://keras.io/ |
| Model Testing and Validation | PyTorch | Open-source machine learning framework that provides a range of tools for model training, evaluation, and validation | https://pytorch.org/ |
Phase III: Operations
| MLOps Step | Tool Name | Tool Description | Link |
|---|---|---|---|
| Example | Example | Example | Example |
| Example | Example | Example | Example |
Other Info
- Research by Software Competence Centre Hagenberg - SCCH