We are seeking a Data Engineer at Senior or Principal level to further develop, maintain and operate our data platform within Parasites and Microbes Programme at the Wellcome Sanger Institute.
About The Role
You will work on a Data Integration and Analysis platform underpinned by a Data Lakehouse (DLH), built on technologies such as object storage, distributed query engines, workflow orchestration, and metadata/catalogue systems. Technologies currently in use include:
- Metadata, governance & security: Hive Metastore, DataHub, Apache Ranger, Keycloak, Vault
- Data access & visualisation: Apache Superset, CloudBeaver
A key facet of the role will be the delivery of a DLH-based data integration and analysis platform for the icddr,b Climate Hub (iCCH), working in collaboration with international partners to enable robust, reproducible analyses linking climate and demographic variables with health outcomes.
You will play an important part in enabling interdisciplinary research by ensuring that data is well-structured, discoverable, and reproducible, supporting scientists to generate new insights from integrated datasets. Ingesting and transforming a wide range of data types (including e.g. geospatial and climate data, along with genomic data) is a key aspect of the role. You will work closely with data engineers, bioinformaticians, and scientists to ensure the platform meets scientific needs while remaining scalable, reliable, and maintainable.
About You
You will be an experienced Data Engineer with a willingness to operate in a hands‑on capacity across all of the layers of the data platform stack.
You will be comfortable in translating often‑complex scientific and data requirements into robust technical solutions, and be able to communicate effectively with both technical and non‑technical stakeholders.
Essential Technical Skills
For both Senior and Principal roles:
- Proficiency in Python, SQL and data transformation practices
- Data modelling and warehousing paradigms (e.g. ELT, Star schemas)
- Modern data platform architectures (e.g. data lakes or lakehouses)
- Distributed query or processing engines (e.g. Trino, Spark, Presto)
- Object storage systems (e.g. S3‑compatible systems such as MinIO)
- Workflow orchestration tools (e.g. Prefect, Airflow)
- Containerisation and orchestration (e.g. Docker, Kubernetes)
- CI/CD (e.g. Gitlab CI, Github Actions)
Additional Expectations For Principal-level Appointments
- Technical leadership, with the ability to define and drive architectural decisions across complex data ecosystems
- Strong ownership and accountability for quality and reliability
- Designing, developing and operating data platforms at scaleLine management, mentoring and coaching
About Us
Within the Parasites and Microbes Programme, we generate and analyse genomic and epidemiological data to better understand infectious diseases and their impact on human populations. Our work increasingly sits at the intersection of multiple data domains, including genomics, public health surveillance, and environmental and climate science.
To support our work, we are developing a modern, scalable Data Lakehouse platform that enables the integration, transformation, and analysis of complex, heterogeneous datasets. This platform is central to a number of strategic initiatives, including a collaboration with International Centre for Diarrhoeal Disease Research in Bangladesh (icddr,b) to investigate the links between climate change and health outcomes.
Salary Range (Dependent On Skills And Experience)
- Grade 1 Principal Data Engineer £61,511 to £73,000 Role Profile
- Grade 2 Senior Data Engineer £50,053 to £59,500 Role Profile
- Contract Type: Fixed Term contract until 29th October 2027
We are seeking a Data Engineer at Senior or Principal level to further develop, maintain and operate our data platform within Parasites and Microbes Programme at the Wellcome Sanger Institute.
About The Role
You will work on a Data Integration and Analysis platform underpinned by a Data Lakehouse (DLH), built on technologies such as object storage, distributed query engines, workflow orchestration, and metadata/catalogue systems. Technologies currently in use include:
- Metadata, governance & security: Hive Metastore, DataHub, Apache Ranger, Keycloak, Vault
- Data access & visualisation: Apache Superset, CloudBeaver
A key facet of the role will be the delivery of a DLH-based data integration and analysis platform for the icddr,b Climate Hub (iCCH), working in collaboration with international partners to enable robust, reproducible analyses linking climate and demographic variables with health outcomes.
You will play an important part in enabling interdisciplinary research by ensuring that data is well-structured, discoverable, and reproducible, supporting scientists to generate new insights from integrated datasets. Ingesting and transforming a wide range of data types (including e.g. geospatial and climate data, along with genomic data) is a key aspect of the role. You will work closely with data engineers, bioinformaticians, and scientists to ensure the platform meets scientific needs while remaining scalable, reliable, and maintainable.
About You
You will be an experienced Data Engineer with a willingness to operate in a hands‑on capacity across all of the layers of the data platform stack.
You will be comfortable in translating often‑complex scientific and data requirements into robust technical solutions, and be able to communicate effectively with both technical and non‑technical stakeholders.
Essential Technical Skills
For both Senior and Principal roles:
- Proficiency in Python, SQL and data transformation practices
- Data modelling and warehousing paradigms (e.g. ELT, Star schemas)
- Modern data platform architectures (e.g. data lakes or lakehouses)
- Distributed query or processing engines (e.g. Trino, Spark, Presto)
- Object storage systems (e.g. S3‑compatible systems such as MinIO)
- Workflow orchestration tools (e.g. Prefect, Airflow)
- Containerisation and orchestration (e.g. Docker, Kubernetes)
- CI/CD (e.g. Gitlab CI, Github Actions)
Additional Expectations For Principal-level Appointments
- Technical leadership, with the ability to define and drive architectural decisions across complex data ecosystems
- Strong ownership and accountability for quality and reliability
- Designing, developing and operating data platforms at scaleLine management, mentoring and coaching
About Us
Within the Parasites and Microbes Programme, we generate and analyse genomic and epidemiological data to better understand infectious diseases and their impact on human populations. Our work increasingly sits at the intersection of multiple data domains, including genomics, public health surveillance, and environmental and climate science.
To support our work, we are developing a modern, scalable Data Lakehouse platform that enables the integration, transformation, and analysis of complex, heterogeneous datasets. This platform is central to a number of strategic initiatives, including a collaboration with International Centre for Diarrhoeal Disease Research in Bangladesh (icddr,b) to investigate the links between climate change and health outcomes.
Salary Range (Dependent On Skills And Experience)
- Grade 1 Principal Data Engineer £61,511 to £73,000 Role Profile
- Grade 2 Senior Data Engineer £50,053 to £59,500 Role Profile
- Contract Type: Fixed Term contract until 29th October 2027
#J-18808-Ljbffr…
