Jump to content

Data Analysis Dashboard: Difference between revisions

No edit summary
Line 40: Line 40:
== Usage ==
== Usage ==
<p style="line-height: 1.5em">
<p style="line-height: 1.5em">
This software helps users analyse datasets and uncover hidden relationships.
1. Upload a Dataset:
Initially the user should perform the following:
Begin by uploading a dataset in .csv format. This serves as the primary input for analysis. The system supports datasets of varying sizes, ensuring flexibility for small exploratory analyses or large-scale data studies.
 
2. Select Columns for Analysis:
Choose specific pairs or groups of columns from the dataset to focus on correlation analysis. This step allows users to isolate meaningful relationships and ensures the analysis remains targeted to the relevant data dimensions.
 
3. Set Thresholds for Similarity Metrics:
Fine-tune the thresholds for multiple similarity measures using an intuitive sidebar interface:
• Pearson Correlation: Adjust the threshold to measure linear relationships between variables.
• Spearman Correlation: Set a rank-based similarity threshold to capture monotonic relationships.
• Euclidean Similarity: Define distance-based thresholds for evaluating proximity between data points.
This customization allows users to align the analysis with the needs of their domain or study.
 
4. Analyze Data and Visualize Results:
Gain deeper insights into your data with interactive visual tools. For example:
• Access detailed visual representations of correlation metrics, including heatmaps and scatter plots. These visuals highlight patterns and relationships in the data, making it easier to interpret findings.
• Explore an interactive knowledge graph that visually maps correlations and relationships between selected data columns, enabling intuitive navigation of complex connections.
 
5. Apply Inference Rules:
Enhance the knowledge graph by introducing domain-specific logic:
• Enter custom rules in an IF-THEN format (e.g., “IF variable A > threshold, THEN infer relationship B”).
This step enables users to derive new relationships and enrich the dataset with inferred knowledge, tailoring the analysis to their specific research goals.
 
6. Execute SPARQL Queries on the RDF Graph:
Use the SPARQL interface to go deeper into the knowledge graph, execute advanced queries, and reveal hidden relationships and insights within the data:
• Dive deeper into the generated knowledge graph using the SPARQL query interface.
• Perform complex queries to explore relationships, extract specific subsets of data, or validate inferred connections.
• This feature integrates the power of semantic querying, enabling users to uncover insights that are not immediately apparent in the raw dataset.
 




Line 50: Line 77:
* Select pairs of features and see the visualisations of correlation analysis
* Select pairs of features and see the visualisations of correlation analysis


[[File:Data analysis dash 2.png|center|x300px|Image Caption]]
 
<div align="center" style="font-size:88%;line-height: 2em">''Image 2: Statistical analysis''</div>


* Build a knowledge graph that represents significant relationships in the data  
* Build a knowledge graph that represents significant relationships in the data  

Revision as of 05:39, 20 October 2025

Dashboard that allows human operators monitor and extract knowledge from tabular data through visualization/interpretability, querying, and inference features.

Image 1: The main view of the Data Analysis Dashboard


Asset Description

The Data Analysis Dashboard facilitates the comprehensive monitoring of tabular data (e.g., from sensor arrays) with sophisticated processing and analysis capabilities. Its key benefits include intuitive visualisation, flexible querying, and the ability to infer patterns and detect anomalies, giving human operators critical decision-making support. It uses some classical Data and Knowledge Engineering methods (e.g., Knowledge Graphs) and is implemented based on the Streamlit framework.

Key Features

Data Upload and Preprocessing:

  • Upload datasets via file input or URL.
  • Manage missing values using various imputation methods.
  • Encode categorical variables and coerce numeric data.

Correlation Analysis:

  • Compute Pearson and Spearman correlations, as well as Euclidean similarity.
  • User-defined thresholds for filtering significant relationships.
  • Knowledge Graph Creation:
  • Automatically generate RDF graphs representing significant correlations.
  • Define relationships using user-specified thresholds.

Inference Rules:

  • Input custom IF-THEN rules to add inferred relationships to the RDF graph.

Visualization:

  • Visualize correlations using interactive Plotly subplots.
  • Display the knowledge graph as a network with customizable aesthetics.

SPARQL Querying:

  • Query the RDF graph using SPARQL with a user-friendly interface.

Usage

1. Upload a Dataset: Begin by uploading a dataset in .csv format. This serves as the primary input for analysis. The system supports datasets of varying sizes, ensuring flexibility for small exploratory analyses or large-scale data studies. 2. Select Columns for Analysis: Choose specific pairs or groups of columns from the dataset to focus on correlation analysis. This step allows users to isolate meaningful relationships and ensures the analysis remains targeted to the relevant data dimensions. 3. Set Thresholds for Similarity Metrics: Fine-tune the thresholds for multiple similarity measures using an intuitive sidebar interface: • Pearson Correlation: Adjust the threshold to measure linear relationships between variables. • Spearman Correlation: Set a rank-based similarity threshold to capture monotonic relationships. • Euclidean Similarity: Define distance-based thresholds for evaluating proximity between data points. This customization allows users to align the analysis with the needs of their domain or study. 4. Analyze Data and Visualize Results: Gain deeper insights into your data with interactive visual tools. For example: • Access detailed visual representations of correlation metrics, including heatmaps and scatter plots. These visuals highlight patterns and relationships in the data, making it easier to interpret findings. • Explore an interactive knowledge graph that visually maps correlations and relationships between selected data columns, enabling intuitive navigation of complex connections. 5. Apply Inference Rules: Enhance the knowledge graph by introducing domain-specific logic: • Enter custom rules in an IF-THEN format (e.g., “IF variable A > threshold, THEN infer relationship B”). • This step enables users to derive new relationships and enrich the dataset with inferred knowledge, tailoring the analysis to their specific research goals. 6. Execute SPARQL Queries on the RDF Graph: Use the SPARQL interface to go deeper into the knowledge graph, execute advanced queries, and reveal hidden relationships and insights within the data: • Dive deeper into the generated knowledge graph using the SPARQL query interface. • Perform complex queries to explore relationships, extract specific subsets of data, or validate inferred connections. • This feature integrates the power of semantic querying, enabling users to uncover insights that are not immediately apparent in the raw dataset.

  • Upload a CSV file containing the data to be analysed
  • Configure the data preprocessing aspects (handling missing values and data types)

Afterwards the user can explore their data and underlying relations , extracting knowledge in a flexible way:

  • Select pairs of features and see the visualisations of correlation analysis
  • Build a knowledge graph that represents significant relationships in the data
  • Interact directly with the knowledge graph through a SPARQL query interface
Image Caption
Image Caption
Image 3: Visualisation, querying and inference


Note: The asset is under Ongoing Development


Licence

This project is licensed under the MIT License.

Resources


Acknowledgement

This tool has been mainly developed in the frame of the project TrineFlex from the European Union’s Horizon Europe research and innovation programme under Grant Agreement No 101058174.

Relevant Categories