We’re looking for a Lead Data Engineer (Databricks, PySpark) to join our team in London, UK in a hybrid working mode.
In this role, you will help shape and deliver next-generation data platforms. You will be hands‑on in developing, implementing and optimizing scalable ETL workflows and data pipelines, leveraging the full capabilities of Databricks and modern cloud technologies. You will play a key part in the transition to a robust Lakehouse architecture, working closely with cross‑functional teams in an agile environment.
This position is ideal for a data engineering leader who enjoys solving complex challenges, mentoring others and working at the forefront of Databricks technology. Experience with any major cloud provider is welcome, but a strong focus on Databricks is essential.
Responsibilities
- Design, develop and maintain production‑grade data applications, reusable frameworks and scalable data pipelines using Databricks, PySpark and Python/Scala
- Lead the architectural design and modernization of data platforms to a Lakehouse architecture leveraging Databricks‑native technologies such as Delta Lake and Unity Catalog
- Drive advanced Spark performance tuning including handling data skew, optimizing Catalyst optimizer/query execution plans and managing cluster compute and memory efficiency for high‑volume workloads
- Champion modern software engineering practices within the data ecosystem including CI/CD pipelines, Infrastructure as Code (IaC), rigorous code reviews, automated testing and version control
- Implement secure, scalable and highly available data solutions leveraging integrations between Databricks and major cloud services (AWS, Azure or GCP)
- Architect and support AI‑driven data solutions including integrating Large Language Models (LLMs), building Agentic workflows and operationalizing GenAI or machine learning models within Databricks pipelines
- Act as a Technical Lead in an agile environment collaborating with architects and product owners to decompose complex business requirements into actionable technical strategies, Epics and User Stories
- Mentor and upskill engineers fostering a culture of engineering excellence, continuous learning and technical innovation
- Serve as a key technical liaison effectively translating and communicating complex architectural decisions, data concepts and system capabilities to both technical and non‑technical stakeholders
Requirements
- Bachelor’s or Master’s degree in Computer Science, Software Engineering or a related field
- Deep, hands‑on proficiency in PySpark with proven ability to tackle advanced performance tuning, data skew handling, memory management and Catalyst optimizer troubleshooting
- Extensive experience building production workloads on Databricks including knowledge of Databricks Workflows, Delta Lake and Unity Catalog for governance and security
- Demonstrable experience designing and migrating to Lakehouse architectures utilizing open table formats such as Delta Lake or Apache Iceberg
- Strong hands‑on experience integrating Databricks with native cloud services on AWS, Azure or GCP
- Advanced programming skills in Python (Scala is a plus) with strong understanding of object‑oriented and functional programming principles
- Proven track record of applying software engineering standards to data pipelines including CI/CD, Infrastructure as Code (e.g. Terraform), version control (Git) and rigorous code reviews
- Solid background in implementing automated testing frameworks and data quality validation within pipelines
- Proven experience as a Senior or Lead Engineer capable of driving technical strategy, making architectural decisions and decomposing complex solutions into Agile Epics and User Stories
- Strong ability to articulate complex technical concepts and trade‑offs clearly to both technical peers and non‑technical stakeholders
- Advantageous: Official Databricks certifications (e.g. Certified Data Engineer Professional, Spark Developer)
- Highly desirable: Hands‑on experience or strong interest in AI and Agentic workflows including operationalizing LLMs, using frameworks like LangChain or LlamaIndex or leveraging Databricks ML/MosaicML for GenAI applications
We offer
- EPAM Employee Stock Purchase Plan (ESPP)
- Protection benefits including life assurance, income protection and critical illness cover
- Private medical insurance and dental care
- Employee Assistance Program
- Cyclescheme, Techscheme and season ticket loans
- Various perks such as free Wednesday lunch in‑office, on‑site massages and regular social events
- Learning and development opportunities including in‑house training and coaching, professional certifications, and courses
- If otherwise eligible, participation in the discretionary annual bonus program
- If otherwise eligible and hired into a qualifying level, participation in the discretionary Long‑Term Incentive (LTI) Program
#J-18808-Ljbffr…
