Jose Gonzalez

Data Engineer & Sports Analyst

Jose Gonzalez

I build scalable ETL pipelines that turn hundreds of millions of records into production-ready data — and apply the same rigor to sports analytics.

Monterrey, Mexico

xG vs Goals — last 8 matcheslive demo
4+
Years in Data
180M+
Records / Day
99%
Faster Pipelines
4
Certifications

Who I Am

About

I'm a Data Engineer with 4 years of experience designing and operating scalable ETL pipelines and data platforms for enterprises like Chubb, ZF Group, and Johnson Controls. My day-to-day is Python, Spark, Databricks, and the Azure/AWS stack — building medallion-architecture lakehouses and multithreaded pipelines that move hundreds of millions of records reliably.

Off the clock, I bring that same engineering rigor to sports analytics: modeling match data, building metrics, and creating dashboards that make the game easier to read. If it can't be trusted in production, it doesn't ship.

Languages

PythonSQLT-SQLRJavaScriptJavaC++

Data Engineering

Apache SparkDatabricksDelta LakeETL/ELTData WarehousingData Modeling

Cloud Platforms

Azure Data FactorySynapseData LakeAWS S3LambdaGlue

Tools & BI

Power BIQlikSSISInformaticaDockerGitCI/CD

Certifications

Databricks Performance OptimizationAWS Cloud PractitionerAzure Data Fundamentals (DP-900)Databricks GenAI Fundamentals

Selected Work

Projects

2026 World Cup Monte Carlo Simulation

500K
Simulations

Simulates the entire 2026 FIFA World Cup 500,000 times using an Elo + xG match model, producing win probabilities for every team from group stage through the final.

PythonNumPyJupyterEloxGMonte Carlo

Automotive Analytics Dashboards

99%
Faster Refresh

Reengineered data architecture and ETL behind customer dashboards for BMW and Ford, plus a KPI platform spanning thousands of plants worldwide.

DatabricksAzurePower BIPythonSSIS

Claims Fraud Detection Platform

180M
Records / Day

End-to-end data platform powering a claims-fraud detection system: medallion-architecture lakehouse feeding ML models across four global regions.

SparkPythonDatabricksDelta LakeQlik

MLB Picks Engine & Performance Dashboard

Top 5
Daily Picks

Automated daily MLB picks system: Python ingests odds, Statcast, and lineup data, runs a proprietary analysis model, and a public dashboard tracks live performance with equity curves and ROI.

PythonMLB StatsAPIGitHub ActionsJavaScriptApexCharts

Supplier Lead-Time Automation

52
Plants

Automated pipeline synchronizing supplier lead times from 13 ERP systems across 52 plants, improving forecasting accuracy for Procurement.

Azure Data FactorySynapsePythonPower AutomateSQL

Career

Resume

Experience

  1. Data Engineer

    Apr 2024 — Present

    Chubb

    • Architected end-to-end data pipelines for a Claims Fraud detection system using Spark, Python, Databricks, and Qlik, processing ~10M claims with low-latency, high-accuracy workflows.
    • Transformed 280+ raw tables and external sources into 75 silver-layer and 2 gold-layer tables following medallion architecture, enabling production-ready ingestion for fraud-detection ML models.
    • Optimized multithreaded Python ETL pipelines processing 180M records daily from claims text across EMEA, APAC, LATAM, and NA, reducing latency and expanding country coverage.
  2. Data Engineer

    Apr 2023 — Apr 2024

    ZF Group

    • Reengineered data architecture and ETL for BMW and Ford dashboards, cutting report refresh times from 1 month to under 5 minutes (99% improvement).
    • Built pipelines tracking 100+ KPIs across 3,750 manufacturing plants worldwide, delivering customized product-analytics dashboards for major automotive clients.
    • Developed and maintained ETL with Databricks, Azure Cloud, Power BI, Python, and SSIS for the Product Development Analytics department.
  3. Data Analyst Specialist

    Jun 2022 — Mar 2023

    Johnson Controls

    • Designed an automated pipeline with Azure Data Factory, Python, and Power Automate to sync Lead Times from 13 supplier ERP systems across 52 plants.
    • Improved forecasting accuracy and revenue alignment by optimizing data-integration workflows for Procurement.
    • Built ETL using the Azure stack (Data Factory, Synapse, Storage), Python, SQL, Informatica, and Power Platform.
  4. Business Intelligence Intern

    Aug 2021 — Dec 2021

    CEMEX

    • Designed and implemented ETL processes to enhance reporting and workflow analysis using Power BI, SSIS, SQL, and Excel.

Education

  1. B.Sc. in Computer Science & Technology Engineering

    Aug 2017 — Jun 2022

    Tecnológico de Monterrey (ITESM)

    • GPA: 3.7 / 4.0
  2. Certifications

    Databricks · AWS · Microsoft

    • Databricks Performance Optimization
    • AWS Certified Cloud Practitioner
    • Azure Data Fundamentals (DP-900)
    • Databricks Generative AI Fundamentals

Get In Touch

Contact

Open to data engineering roles and sports analytics collaborations. The fastest way to reach me is email — I read everything.

jgacontact@gmail.com