Requirements
- Strong experience in data engineering within AWS cloud environments
- Hands‑on experience with AWS big data technologies such as EMR, S3 and SageMaker
- Proficiency in Python for building scalable data pipelines and processing frameworks. Experience with Apache Spark for distributed data processing
- Experience designing and maintaining scalable batch and real‑time data pipelines
- Solid understanding of ETL/ELT design patterns and data modelling techniques
- Experience with workflow orchestration tools such as Apache Airflow (ideally deployed on AWS)
- Familiarity with containerisation and orchestration using Docker and Kubernetes (EKS)
- Experience with infrastructure as code (e.g. Terraform) and CI/CD/GitOps practices
- Proven ability to optimise performance and reduce cloud costs through partitioning, clustering and workload management
- Understanding of data security principles, including data loss prevention (DLP)
- (Desirable) Experience with Databricks or similar third‑party big data platforms
- (Desirable) Knowledge of real‑time streaming technologies (e.g. Kafka, Kinesis)
- (Desirable) Experience implementing data governance and compliance frameworks
- (Desirable) Familiarity with monitoring and observability tools in AWS environments
- (Desirable) Exposure to Lakehouse or modern data platform architectures
What the job involves
- We are looking for a Data Engineer to work closely with the Data Science team to develop robust data pipelines that feed analytics and machine learning tools such as Amazon SageMaker and third‑party platforms like Databricks
- You will leverage AWS technologies such as EMR, S3, EKS and Airflow to process and orchestrate high‑volume datasets, ensuring solutions are scalable, resilient and cost‑efficient
- You will also play a key role in embedding data loss prevention (DLP) principles and controls into data pipelines to protect sensitive information, while ensuring data is reliable, accessible, well‑governed and optimised for downstream consumption
Requirements
- Strong experience in data engineering within AWS cloud environments
- Hands‑on experience with AWS big data technologies such as EMR, S3 and SageMaker
- Proficiency in Python for building scalable data pipelines and processing frameworks. Experience with Apache Spark for distributed data processing
- Experience designing and maintaining scalable batch and real‑time data pipelines
- Solid understanding of ETL/ELT design patterns and data modelling techniques
- Experience with workflow orchestration tools such as Apache Airflow (ideally deployed on AWS)
- Familiarity with containerisation and orchestration using Docker and Kubernetes (EKS)
- Experience with infrastructure as code (e.g. Terraform) and CI/CD/GitOps practices
- Proven ability to optimise performance and reduce cloud costs through partitioning, clustering and workload management
- Understanding of data security principles, including data loss prevention (DLP)
- (Desirable) Experience with Databricks or similar third‑party big data platforms
- (Desirable) Knowledge of real‑time streaming technologies (e.g. Kafka, Kinesis)
- (Desirable) Experience implementing data governance and compliance frameworks
- (Desirable) Familiarity with monitoring and observability tools in AWS environments
- (Desirable) Exposure to Lakehouse or modern data platform architectures
What the job involves
- We are looking for a Data Engineer to work closely with the Data Science team to develop robust data pipelines that feed analytics and machine learning tools such as Amazon SageMaker and third‑party platforms like Databricks
- You will leverage AWS technologies such as EMR, S3, EKS and Airflow to process and orchestrate high‑volume datasets, ensuring solutions are scalable, resilient and cost‑efficient
- You will also play a key role in embedding data loss prevention (DLP) principles and controls into data pipelines to protect sensitive information, while ensuring data is reliable, accessible, well‑governed and optimised for downstream consumption
#J-18808-Ljbffr…
