Cream Cheese Production and Quality Dataset: Difference between revisions

Latest revision as of 12:17, 20 October 2025

Dataset containing sensor measurements and quality data from Quescrem's cream cheese production chain.

Image 1: This dataset collects tabular data of Quescrem’s production chain of their main product (cream cheese), including sensor measurements from the production process as well as quality and composition parameters from laboratory analysis.

Asset Description

Dataset generated in order to train and validate the AI/ML models developed in the Pilot III SME-driven experiment lead by Quescrem within the AI REDGIO 5.0: “AI at the Edge for Zero Defect Food Industry and Sustainability Gain”. The dataset collects tabular data of diverse product batches of Quescrem’s main product family, including sensor measurements from the production process (temperature, pressure, flow rate, fermentation times, etc.) as well as quality and composition parameters from laboratory analysis of the raw materials used and intermediate products (protein, fat, dry matter, acidity, etc.). The purpose of the dataset built is to allow the application of advanced data analysis techniques and AI/ML models to provide insights about how the combination of all the previous features affect the main quality indicators of the released product, including the levels of hardness, acidity or pH.

Asset Details

Dataset Information

The dataset built has been produced thanks to the design of a method that provides traceability of the same product batch along all the production chain, including all the subprocesses, from the initial raw material mix until the final packaging of the cream cheese product. This means that all data sources managed in Quescrem’s systems and all the information stored in each one of them has been considered, having to link each table with each other to provide the aforementioned traceability. The goal was to, starting from a specific final product batch ID, retrieve all the information (available in each of the data sources) that corresponds to that specific product.

That method was automated through a Python application, which collects and processes data from all the sources available and merges it into the final dataset, provided in CSV format. Then, taking into account the huge amount of information (features) that were available, it was necessary to identify which of the collected features were actually relevant for the forecasting of the quality KPIs. That is, which ones provide meaningful information about the production process and influence the quality parameters of the final product (prediction targets).

As a result, the dataset that was used for the training and testing of the AI models for the AI REDGIO 5.0 pilot was built. The number of samples (number of product batches) included are quite limited, due to the complexity and inconsistencies found in the format of the historical data available and in the relationships between data sources. However, the dataset includes and relates variables corresponding to the whole production chain, which is extremely valuable since it is a very complex production process, with many interlinked subprocesses, and no data collection of the overall process was available so far for each product batch.

It should be taken into account that some of the variables included in the dataset come from real-time data streams, since the sensors read one sample of those variables every second for the duration of the corresponding production subprocess. In those cases, the average value is provided.

Examples of some of the variables included in the dataset are the pH of the added cream, the fat and protein percentages of the mix before the pasteurization subprocess, the average pressure during the concentration subprocess, the average temperature of the mix during the pasteurization subprocess, the average pressure of the pasteurization tank, the average viscosity of the mix during the pasteurization subprocess, etc.

Usage

This is a dataset containing tabular data that can be used for training and testing AI/ML models.

Maturity

Ongoing development: currently the dataset has around 300 pre-processed samples, annotated with the corresponding quality KPI values for each product batch.

Licence

Proprietary

Resources

For further information, please contact danielestrada@quescrem.es.
Provided by GRADIANT and Quescrem

Acknowledgement

The dataset was created in the framework of the AI REDGIO 5.0 project. It will be used in the Industrial Pilot III (AI at the Edge for Zero Defect Food Industry and Sustainability Gain), which is being developed by Quescrem and Gradiant.

Relevant Categories

@@ Line 9: / Line 9: @@
 === Dataset Information ===
 <p style="line-height: 1.5em">
-The dataset built has been produced thanks to the design of a method that provides traceability of the same product batch along all the production chain, including all the subprocesses, from the initial raw material mix until the final packaging of the cream cheese product. This means that all data sources managed in Quescrem’s systems and all the information stored in each one of them has been considered, having to link each table with each other to provide the aforementioned traceability. The goal was to, starting from a specific final product batch ID, retrieve all the information (available in each of the data sources) that corresponds to that specific product.
+The dataset built has been produced thanks to the design of a method that provides traceability of the same product batch along all the production chain, including all the subprocesses, from the initial raw material mix until the final packaging of the cream cheese product. This means that all data sources managed in Quescrem’s systems and all the information stored in each one of them has been considered, having to link each table with each other to provide the aforementioned traceability. The goal was to, starting from a specific final product batch ID, retrieve all the information (available in each of the data sources) that corresponds to that specific product.</p>
-That method was automated through a Python application, which collects and processes data from all the sources available and merges it into the final dataset, provided in CSV format. Then, taking into account the huge amount of information (features) that were available, it was necessary to identify which of the collected features were actually relevant for the forecasting of the quality KPIs. That is, which ones provide meaningful information about the production process and influence the quality parameters of the final product (prediction targets).
+<p style="line-height: 1.5em">
+That method was automated through a Python application, which collects and processes data from all the sources available and merges it into the final dataset, provided in CSV format. Then, taking into account the huge amount of information (features) that were available, it was necessary to identify which of the collected features were actually relevant for the forecasting of the quality KPIs. That is, which ones provide meaningful information about the production process and influence the quality parameters of the final product (prediction targets).</p>
 As a result, the dataset that was used for the training and testing of the AI models for the AI REDGIO 5.0 pilot was built. The number of samples (number of product batches) included are quite limited, due to the complexity and inconsistencies found in the format of the historical data available and in the relationships between data sources. However, the dataset includes and relates variables corresponding to the whole production chain, which is extremely valuable since it is a very complex production process, with many interlinked subprocesses, and no data collection of the overall process was available so far for each product batch.

notifications

user-interface-preferences

Search

Navigation