Site Reliability Engineer, GPUs in AI

Company: Radley James

Location: London

Posted: May 10th, 2026

We are recruiting for a young AI firm that has sprung out of the US but is growing in London. The team of engineers and researchers come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic etc.

They are looking for a Senior Systems Engineer to focus on cluster management, platform engineering that handles high number of GPUs (their range currently is in the 20k-40k), monitoring/reliability and work on infrastructure for next-generation GPU deployments.

Requirements:

6 years experience in a high performance field like AI, big tech, or quantitative trading

Experience of working on clusters of 1000 GPUs or higher

Experience of driving key projects in your team or business

#J-18808-Ljbffr

Apply Now