Case Study | EDA Data Platform

Case Study: EDA Data Platform

Operationalizing governed plant data for enterprise analytics and decision velocity.

EDA Data Platform dashboard and analytics architecture

Project Snapshot

Role: Lead Data Scientist / Platform Architect
Domain: Manufacturing analytics and governance
Stack: Azure ML, Snowflake/Snowpark, Python APIs, MLOps
Timeline: 2022 – Oct 2025 (enterprise delivery phase)

Platform deployment

3 Plants

Setup across three operational domains with unified data access and analysis workflows.

Time saved

16+ Hrs/Week

Engineers reported near-zero time coordinating data across spreadsheets and notebooks.

Analysis velocity

Click-to-Insight

"Vital few" variable identification with one click. Heatmaps for covariates and collinearity.

Quality outcomes

5% → 1% Defects

Nuisance defect rates in targeted workflows. >10% yield lift in one-year window.

Technical Architecture

graph TD
    subgraph Sources
        A[Plant Sensors] --> B[ historians ]
        C[ERP Systems] --> D[SAP/Oracle]
        E[Quality Labs] --> F[LIMS]
    end
    
    subgraph Ingestion
        B --> G[Ingestion Pipeline]
        D --> G
        F --> G
        G --> H[Validation Layer]
    end
    
    subgraph Storage
        H --> I[Snowflake Data Warehouse]
    end
    
    subgraph Delivery
        I --> J[REST API]
        I --> K[Dashboards]
        I --> L[ML Models]
    end
    
    subgraph Consumers
        J --> M[Plant Engineers]
        K --> N[Operations Teams]
        L --> O[Data Scientists]
    end

Data flow: Plant sensors, ERP systems, and quality labs feed into a unified ingestion pipeline. Validation ensures data quality before storage in Snowflake. Delivery layers include REST APIs, dashboards, and ML model endpoints.

Decision Tradeoffs

Option Considered	Pros	Cons	Decision
Snowflake-native	Managed infrastructure, fast queries, Snowpark for transformations	Vendor lock-in, per-query pricing at scale	Selected — enterprise already invested, team expertise available
PostgreSQL + PostGIS	Open source, full control, no query pricing	More ops overhead, team capacity constraints	Rejected — would require dedicated DBA capacity
Databricks Unity Catalog	Strong ML integration, governance features	Higher complexity, migration cost	Deferred — considered for future ML platform consolidation

Quantified Outcomes (Public-Shareable)

16+ hours/week of engineering and administrative effort reclaimed through analytics automation patterns.
5% to 1% nuisance defect-rate shift in targeted quality workflows using stronger data feedback loops.
>10% yield improvement delivered in a one-year optimization window where governed analytics informed interventions.

Problem

Manufacturing stakeholders needed reliable, timely, and consistent access to process data, but data was fragmented across systems and teams. This slowed troubleshooting, benchmarking, and adoption of advanced analytics.

Approach

I led design and deployment of a governed EDA platform composed of ingestion pipelines, validation rules, and API-based delivery. The architecture balanced plant usability, IT governance, and analytical flexibility.

Outcome

The platform became a core analytics layer for multiple initiatives, enabling faster root-cause analysis and more consistent reporting across operations. Engineers described it as game changing for discussions, brainstorming, sanity checks, and long-term trendlines. The "vital few" variable identification and heatmap for covariates became standard practice.

Platform metrics: 3 plants, unified data access, click-to-insight workflow
Time saved: Near-zero coordination time across spreadsheets and notebooks
Business outcomes: 5% → 1% defect reduction, >10% yield improvement

Leadership Contribution

Architecture: Designed the data model, ingestion pipeline, and validation layer — decided on Snowflake-native approach after evaluating PostgreSQL and Databricks options.
Team: Led 3-person analytics team through delivery, establishing code review and testing practices.
Governance: Established data quality standards adopted plant-wide, including validation rules and documentation.
Outcomes: Engineers reported the platform was game changing — "click a button to find the vital few variables" and heatmaps for covariates during discussions, brainstorming, and sanity checks.

Open the Live App Discuss Similar Work Back to Portfolio