Senior Data Engineer with 8+ years of experience designing and delivering scalable cloud data platforms and ETL pipelines. Expertise in Python, SQL, AWS services, data warehousing, LLMs, AI, and implementing data governance frameworks.
class DataEngineer:
def __init__(self):
self.expertise = [
"Cloud Data Pipelines",
"ETL/ELT Automation",
"AWS & GCP Services",
"Data Warehousing",
"Pipeline Orchestration"
]
self.years_experience = 8
self.location = "Toronto, ON"
self.mission = "Design scalable cloud data platforms and deliver high-quality data solutions"
Senior Data Engineer with 8+ years of experience designing and delivering scalable cloud data platforms and ETL pipelines. Expertise in Python, SQL, and AWS services including S3, Lambda, Glue, Redshift, DynamoDB, and RDS. Skilled in data warehousing, dimensional modeling, and implementing data governance frameworks.
Proven expertise in building data models, orchestrating complex pipelines with tools like Airflow, dbt, and Matillion, and delivering high-quality data solutions for real-time and batch analytics. Passionate about enabling experimentation, A/B testing, and user engagement analysis to improve product experiences. Currently exploring GenAI tools, LLMs, and AI technologies to enhance data processing, automate insights generation, and build intelligent data platforms.
Production-ready ETL pipeline framework built with Apache Airflow, Python, and PostgreSQL. Features modular design with reusable extractors, transformers, loaders, and validators. Includes Docker Compose setup, custom Airflow operators, comprehensive tests, and examples.
Comprehensive data quality validation and monitoring framework with support for completeness, accuracy, consistency, and timeliness checks. Includes Great Expectations integration, HTML/JSON reporting, alerting, and Airflow integration.
Infrastructure as Code (IaC) for AWS and GCP data platforms using Terraform. Includes modules for Redshift, S3, Glue, Lambda, BigQuery, Dataflow, Cloud Storage, and monitoring. Supports dev, staging, and production environments.
dbt project for data warehouse transformations with SQL and Jinja templating. Features staging, intermediate, and marts layers with comprehensive tests, snapshots, seeds, and custom macros. Follows dbt best practices for modular transformations.
Python framework for REST API data extraction and integration. Supports OAuth2 and API key authentication, rate limiting, error handling, incremental extraction, data transformation, and database loading. Includes comprehensive examples and tests.
I'm always open to discussing new opportunities, interesting projects, or just having a chat about data engineering.