Senior Software Engineer, ML Platform (Stability & Infrastructure)
London
About Iso
Isomorphic Labs (IsoLabs) launched in 2021 to advance human health by building AI models that accelerate scientific discovery.
Your Impact
You will play a pivotal role in ensuring the reliability and scalability of the foundations making our AI work possible.
What You Will Do
- Own the end-to-end strategy for platform reliability, focusing on accelerator (GPU/TPU) infrastructure and workload orchestration.
- Lead reliability work for our global job scheduler, designing and implementing a robust “test harness” to validate infrastructure upgrades.
- Architect and optimize next-generation inference services to address scaling limits and maintain high-throughput performance.
- Overhaul logging and monitoring systems to provide proactive alerting and telemetry that identifies failures before they impact research.
- Improve internal CI/CD stability, reducing failure rates and speeding feedback loops for the engineering organization.
- Contribute to core technical decisions on tooling and architecture while partnering with science, product, and operations teams.
Skills and Qualifications
- Proven experience architecting and managing large-scale AI/ML workloads in production.
- Expertise in cloud compute design, specifically within Google Cloud Platform (GCP).
- Significant experience deploying and managing complex workloads within Kubernetes (GKE).
- Professional familiarity with NVIDIA GPU generations and high-performance compute.
- Strong programming skills and a “reliability-first” approach to software development.
Nice to Have
- Career spanning both ML software engineering and infrastructure SRE roles.
- Experience leading multidisciplinary projects and navigating complex stakeholder requirements.
- Familiarity with workload scheduling, ML efficiency research, and hardware benchmarking.
- Experience with Google TPU generations and specialized ML-driven R&D cycles.
We require you to be able to come into the office three days a week (currently Tuesday, Wednesday, and one other day depending on your team).
We are committed to equal employment opportunities regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy or related condition (including breastfeeding) or any other basis protected by applicable law. If you have a disability or additional need that requires accommodation, please let us know.
By submitting an application, your data will be processed in line with our privacy policy.
#J-18808-Ljbffr…
