Staff Cloud SRE: AI/ML Platform & GPU Compute

{ “@context”: “http://schema.org”, “@type”: “JobPosting”, “title”: “Staff Cloud SRE: AI/ML Platform & GPU Compute”, “description”: “

Deepstreamtech is looking for a Staff Site Reliability Engineer to shape the reliability of large-scale AI systems and GPU compute infrastructure. In this foundational role, you will establish reliability frameworks and operational standards to ensure the performance of cloud infrastructures.

Your responsibilities will span from defining SLOs to participating in a 24/7 on-call rotation. Ideal candidates will have strong experience in SRE roles, particularly in GPU environments, Kubernetes, and cloud platforms like AWS, GCP, or Azure.

#J-18808-Ljbffr”, “datePosted”: “2026-05-20”, “hiringOrganization”: { “@type”: “Organization”, “name”: “Deepstreamtech”, “sameAs”: “https://uk.whatjobs.com/pub_api__cpl__436984645__4861?utm_campaign=publisher&utm_medium=api&utm_source=4861&geoID=33” }, “jobLocation”: { “@type”: “Place”, “address”: { “@type”: “PostalAddress”, “addressLocality”: “London” } } }
Company: Deepstreamtech
Apply for the Staff Cloud SRE: AI/ML Platform & GPU Compute
Location: London
Job Description:

Deepstreamtech is looking for a Staff Site Reliability Engineer to shape the reliability of large-scale AI systems and GPU compute infrastructure. In this foundational role, you will establish reliability frameworks and operational standards to ensure the performance of cloud infrastructures.

Your responsibilities will span from defining SLOs to participating in a 24/7 on-call rotation. Ideal candidates will have strong experience in SRE roles, particularly in GPU environments, Kubernetes, and cloud platforms like AWS, GCP, or Azure.

#J-18808-Ljbffr…

Posted: May 20th, 2026