Design and develop scalable, high-performance data architectures and pipelines based on coding best practices.
Lead and mentor a team colleagues, providing technical guidance and coaching
Develop, maintain and optimize data processing workflows for both real-time and batch operations.
Collaborate and engage with Data Analysts and Data Scientists to design robust data architectures and pipelines, ensuring delivery of data in preferred formats for analytics or model training.
Ensure data quality and consistency through data cleaning, transformation, and validation processes.
Evaluating and improving model performance through statistical analysis and experimentation
Develop and maintain an automate deployment (CI/CD) from Git Repository to Platform
Take an ownership with assigned data domain, provide technical consultation and solution recommendations to stakeholders.
Drive Agile (Scrum/Kanban) development methodologies.
Qualifications:
8+ years experience in Data Engineering.
Strong hands-on coding with Scala and Python (PySpark).
Strong expertise in Google Cloud Platform (GCP) with proficiency in AWS.
Experience with Data Lake (AWS S3) and knowledge in BigQuery Lakehouse.
Proficiency in ETL/ELT design, data modeling, and distributed processing (e.g., Spark, Pandas, or Beam).