Deepa Govinda Panicker - Senior Data Engineer

// About Me

                            python
                        

                            class DataEngineer:
    def __init__(self):
                        self.expertise = [
            "Cloud Data Pipelines",
            "ETL/ELT Automation",
            "AWS & GCP Services",
            "Data Warehousing",
            "Pipeline Orchestration"
        ]
        self.years_experience = 8
        self.location = "Toronto, ON"
        self.mission = "Design scalable cloud data platforms and deliver high-quality data solutions"
                        

Senior Data Engineer with 8+ years of experience designing and delivering scalable cloud data platforms and ETL pipelines. Expertise in Python, SQL, and AWS services including S3, Lambda, Glue, Redshift, DynamoDB, and RDS. Skilled in data warehousing, dimensional modeling, and implementing data governance frameworks.

Proven expertise in building data models, orchestrating complex pipelines with tools like Airflow, dbt, and Matillion, and delivering high-quality data solutions for real-time and batch analytics. Passionate about enabling experimentation, A/B testing, and user engagement analysis to improve product experiences. Currently exploring GenAI tools, LLMs, and AI technologies to enhance data processing, automate insights generation, and build intelligent data platforms.

8+ Years Experience

100+ Pipelines Built

30+ Technologies

Skills & Technologies

Cloud Platforms

AWS (S3, Lambda, Redshift, Glue, RDS, DynamoDB, ECS Fargate) Azure GCP (BigQuery, Cloud Storage, Cloud Dataflow)

Data Engineering & ETL

Airflow dbt Matillion Informatica PowerCenter SSIS Autosys

Programming & Scripting

Python (pandas, numpy, requests) Advanced SQL PL/SQL Shell Scripting REST APIs & JSON

Data Warehousing

Dimensional Modeling (Star/Snowflake) Redshift SQL Server Oracle Teradata DB2

Big Data & Analytics

Databricks PySpark Apache Spark Lambda Architecture A/B Testing

Infrastructure & DevOps

Terraform Jenkins Git CI/CD Pipelines Containerized Services

Monitoring & Governance

CloudWatch Grafana Data Quality Frameworks Metadata Management

Business Intelligence

Looker (LookML) SSRS Report Builder

AI & Machine Learning

LLMs (OpenAI, Anthropic) RAG (Retrieval Augmented Generation) Vector Databases Model Context Protocol (MCP) AI-Powered Data Insights

Featured Projects

ETL Pipeline with Apache Airflow

Production-ready ETL pipeline framework built with Apache Airflow, Python, and PostgreSQL. Features modular design with reusable extractors, transformers, loaders, and validators. Includes Docker Compose setup, custom Airflow operators, comprehensive tests, and examples.

Apache Airflow Python PostgreSQL Docker

Data Quality Framework

Comprehensive data quality validation and monitoring framework with support for completeness, accuracy, consistency, and timeliness checks. Includes Great Expectations integration, HTML/JSON reporting, alerting, and Airflow integration.

Python Great Expectations Data Validation Monitoring

Cloud Data Platform (IaC)

Infrastructure as Code (IaC) for AWS and GCP data platforms using Terraform. Includes modules for Redshift, S3, Glue, Lambda, BigQuery, Dataflow, Cloud Storage, and monitoring. Supports dev, staging, and production environments.

Terraform AWS GCP IaC

dbt Data Transformation

dbt project for data warehouse transformations with SQL and Jinja templating. Features staging, intermediate, and marts layers with comprehensive tests, snapshots, seeds, and custom macros. Follows dbt best practices for modular transformations.

dbt SQL Data Warehousing Jinja

API Data Integration

Python framework for REST API data extraction and integration. Supports OAuth2 and API key authentication, rate limiting, error handling, incremental extraction, data transformation, and database loading. Includes comprehensive examples and tests.

Python REST APIs OAuth2 ETL

Smart Data AI

Intelligent data platform combining data processing, Large Language Models (LLMs), AI models, and Model Context Protocol (MCP). Features OpenAI/Anthropic integration, RAG (Retrieval Augmented Generation), vector stores, and AI-powered data insights.

LLM OpenAI RAG MCP

Experience

Senior Data Developer

Porter Airlines Sep 2022 - Present

Built scalable data pipelines using Matillion and Redshift to support enterprise data warehousing, reporting, and campaign analytics
Automated end-to-end data ingestion and transformation processes from MSSQL Server into Redshift, reducing manual intervention by over 60%
Migrated marketing workflows from Responsys to Braze using REST APIs and Customer Data Integration (CDI), enabling real-time segmentation and personalized messaging
Integrated aircraft maintenance records into AMOS by developing Python scripts and ETL pipelines, eliminating 20+ hours/month of manual data entry
Provisioned and automated data infrastructure components (Redshift, S3, IAM) using Terraform to support scalable ETL pipelines in AWS
Partnered with product teams to design and deliver analytics pipelines enabling A/B testing, user funnel analysis, and feature adoption insights
Mentored junior data developers on SQL best practices, version control (Git), and pipeline troubleshooting in a fast-paced Agile environment

Data Developer

Cogeco Inc. Apr 2021 - Sep 2022

Designed and optimized GCP data pipelines using BigQuery, Cloud Storage, and Cloud Dataflow to power real-time analytics for enterprise reporting
Built reusable data transformation frameworks using SQL, Python, and dbt, improving development speed and standardization across business units
Modeled enterprise data warehouse (EDW), ODS, and data marts using star and snowflake schemas to support finance, marketing, and operational reporting
Wrote optimized BigQuery SQL for data wrangling, deduplication, partitioning, clustering, and materialized views to reduce query cost and improve performance
Implemented data quality frameworks, validation rules, and anomaly detection scripts to ensure completeness, accuracy, and consistency of critical datasets
Built batch data transformation prototypes using Databricks and PySpark, including campaign data cleanup and format conversion pipelines
Reduced report generation time by ~40% by migrating legacy queries to partitioned BigQuery tables
Automated ETL tasks saving 10+ hours per week in manual processing and improved data quality scores by 30%

Senior System Engineer (Data Developer)

Tech Mahindra Jan 2017 - Aug 2019

Designed hybrid cloud data architecture using AWS (S3, Lambda, DynamoDB, Kinesis) and Azure for batch and streaming analytics
Built automated ETL pipelines in Python and Shell, reducing manual processing by 70% and improving data freshness SLAs
Implemented real-time ingestion workflows with Lambda and Kinesis for event-driven data sources
Led Terraform-based infrastructure provisioning and integrated CI/CD pipelines to accelerate release cycles by 30%
Partnered with stakeholders to define KPIs, model datasets, and develop dashboards for operational and business insights

System Engineer (Data Developer)

Tech Mahindra Feb 2016 - Dec 2016

Developed and maintained ETL pipelines using Python, Shell scripting, and PL/SQL to process structured and semi-structured data from SQL Server, Oracle, and flat files
Supported data warehousing and dimensional modeling efforts (star/snowflake schemas) for enterprise reporting needs
Designed batch processing jobs using SSIS, Informatica PowerCenter, and Autosys, improving data delivery SLAs by 30%
Conducted data validation, QA, and reconciliation for critical business datasets using SQL