Company: Wellcome Sanger Institute
Location: Hinxton
Posted: May 20th, 2026
We are seeking a Data Engineer at Senior or Principal level to further develop, maintain and operate our data platform within Parasites and Microbes Programme at the Wellcome Sanger Institute.
You will work on a Data Integration and Analysis platform underpinned by a Data Lakehouse (DLH), built on technologies such as object storage, distributed query engines, workflow orchestration, and metadata/catalogue systems. Technologies currently in use include:
A key facet of the role will be the delivery of a DLH-based data integration and analysis platform for the icddr,b Climate Hub (iCCH), working in collaboration with international partners to enable robust, reproducible analyses linking climate and demographic variables with health outcomes.
You will play an important part in enabling interdisciplinary research by ensuring that data is well-structured, discoverable, and reproducible, supporting scientists to generate new insights from integrated datasets. Ingesting and transforming a wide range of data types (including e.g. geospatial and climate data, along with genomic data) is a key aspect of the role. You will work closely with data engineers, bioinformaticians, and scientists to ensure the platform meets scientific needs while remaining scalable, reliable, and maintainable.
You will be an experienced Data Engineer with a willingness to operate in a hands‑on capacity across all of the layers of the data platform stack.
You will be comfortable in translating often‑complex scientific and data requirements into robust technical solutions, and be able to communicate effectively with both technical and non‑technical stakeholders.
For both Senior and Principal roles:
Within the Parasites and Microbes Programme, we generate and analyse genomic and epidemiological data to better understand infectious diseases and their impact on human populations. Our work increasingly sits at the intersection of multiple data domains, including genomics, public health surveillance, and environmental and climate science.
To support our work, we are developing a modern, scalable Data Lakehouse platform that enables the integration, transformation, and analysis of complex, heterogeneous datasets. This platform is central to a number of strategic initiatives, including a collaboration with International Centre for Diarrhoeal Disease Research in Bangladesh (icddr,b) to investigate the links between climate change and health outcomes.