Staff Software Engineer (Inference Infrastructure)

{ “@context”: “http://schema.org”, “@type”: “JobPosting”, “title”: “Staff Software Engineer (Inference Infrastructure)”, “description”: “

Requirements

Are you energized by building high-performance, scalable and reliable machine learning systems? Do you want to help define and build the next generation of AI platforms powering advanced NLP applications?
5+ years of engineering experience running production infrastructure at a large scale
Experience designing large, highly available distributed systems with Kubernetes and GPU workloads on those clusters
Experience with Kubernetes dev and production coding and support
Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving
Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
Experience in compute/storage/network resource and cost management
Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork
The grit and adaptability to solve complex technical challenges that evolve day to day
Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference
Strong understanding or working experience with distributed systems
Experience in Golang, C++ or other languages designed for high-performance scalable servers
If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply!

What the job involves

We are looking for Members of Technical Staff to join the Model Serving team at Cohere
The team is responsible for developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints
In this role, you will work closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments
You will also get the opportunity to interface with customers and create customized deployments to meet their specific needs

#J-18808-Ljbffr”, “datePosted”: “2026-05-18”, “hiringOrganization”: { “@type”: “Organization”, “name”: “Deepstreamtech”, “sameAs”: “https://uk.whatjobs.com/pub_api__cpl__435628897__4861?utm_campaign=publisher&utm_medium=api&utm_source=4861&geoID=33” }, “jobLocation”: { “@type”: “Place”, “address”: { “@type”: “PostalAddress”, “addressLocality”: “London” } } }

Company: Deepstreamtech

Apply for the Staff Software Engineer (Inference Infrastructure)

Location: London

Job Description:

Requirements

Are you energized by building high-performance, scalable and reliable machine learning systems? Do you want to help define and build the next generation of AI platforms powering advanced NLP applications?
5+ years of engineering experience running production infrastructure at a large scale
Experience designing large, highly available distributed systems with Kubernetes and GPU workloads on those clusters
Experience with Kubernetes dev and production coding and support
Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving
Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
Experience in compute/storage/network resource and cost management
Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork
The grit and adaptability to solve complex technical challenges that evolve day to day
Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference
Strong understanding or working experience with distributed systems
Experience in Golang, C++ or other languages designed for high-performance scalable servers
If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply!

What the job involves

We are looking for Members of Technical Staff to join the Model Serving team at Cohere
The team is responsible for developing, deploying, and operating the AI platform delivering Cohere’s large language models through easy to use API endpoints
In this role, you will work closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments
You will also get the opportunity to interface with customers and create customized deployments to meet their specific needs

#J-18808-Ljbffr…

Posted: May 18th, 2026