Trust AI & Data Quality Pipeline is a comprehensive framework designed to enhance data quality (DQ) and integrate explainable AI (XAI) methodologies. It addresses challenges in real-world data processing and model reliability by establishing automated workflows that optimize data preprocessing, boost interpretability, and enable robust decision-making.
-
Data Quality Enhancement
- Implements both parametric and non-parametric imputation methods such as Optimized Fuzzy C-Means (OFCM) and other ML/DL-based approaches.
- Evaluates imputation effectiveness using metrics like Differential Entropy, Canonical Relative Entropy, and JS-Divergence.
-
Explainable AI (XAI) Framework
- Active Learning: Adapts dynamically to user requirements while ensuring efficient labeling through knowledge distillation.
- SHAP Interpretability: Provides feature-level importance explanations.
- Counterfactual Explanations: Offers actionable insights by generating alternative scenarios that would lead to different predictions.
-
Pipeline Automation
- Combines the data quality module (EDASH) and XAI components to automate end-to-end workflows, ensuring reliable and interpretable AI solutions.
-
Imputation Algorithms
- Optimized Fuzzy C-Means (OFCM): Incorporates weighting factors to optimize cluster accuracy and minimize overfitting.
- Other methods include Expectation-Maximization and MissForest.
-
Evaluation Metrics
- Differential Entropy: Measures data distribution before and after imputation.
- JS-Divergence: Quantifies similarities between datasets.
- NRMSE (Normalized Root Mean Square Error): Assesses reconstruction quality.
-
Active Learning
- Query Strategy: Includes uncertainty-based sampling and margin sampling to optimize labeled data acquisition.
- Human Annotation: Ensures high-confidence labeling for critical datasets.
-
SHAP (Shapley Additive Explanations)
- Visualizes feature importance to highlight model decision logic.
- Enhances trust by presenting interpretable insights to users.
-
Counterfactual Explanations
- Provides "what-if" scenarios to facilitate decision-making by adjusting critical feature values.
- Achieved an F1 score of 0.9928 in predictive tasks.
- Imputation tests demonstrated superior reconstruction accuracy using OFCM, outperforming baseline methods by >20%.
- Active Learning reduced labeling costs by >30% while maintaining high model accuracy.
- Frontend: React.js with Tailwind CSS for responsive UI.
- Backend: Django with PostgreSQL, containerized using Docker.
- Imputation Module: Python with NumPy, SciPy, and custom ML algorithms.
cd frontend
npm install
npm run dev
- Accessible at:
http://localhost:5173
cd backend
docker compose up --build
- Django API runs on:
http://127.0.0.1:8000
- EDASH Repository: GitHub - EDASH
- Presentation Materials: Poster