Staff Software Engineer (Inference Infrastructure)

{ “@context”: “http://schema.org”, “@type”: “JobPosting”, “title”: “Staff Software Engineer (Inference Infrastructure)”, “description”: “

Requirements

  • Are you energized by building high-performance, scalable and reliable machine learning systems? Do you want to help define and build the next generation of AI platforms powering advanced NLP applications?
  • 5+ years of engineering experience running production infrastructure at a large scale
  • Experience designing large, highly available distributed systems with Kubernetes and GPU workloads on those clusters
  • Experience with Kubernetes dev and production coding and support
  • Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving
  • Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
  • Experience in compute/storage/network resource and cost management
  • Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork
  • The grit and adaptability to solve complex technical challenges that evolve day to day
  • Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference
  • Strong understanding or working experience with distributed systems
  • Experience in Golang, C++ or other languages designed for high-performance scalable servers
  • If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply!

What the job involves

  • We are looking for Members of Technical Staff to join the Model Serving team at Cohere
  • The team is responsible for developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints
  • In this role, you will work closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments
  • You will also get the opportunity to interface with customers and create customized deployments to meet their specific needs

#J-18808-Ljbffr”, “datePosted”: “2026-05-18”, “hiringOrganization”: { “@type”: “Organization”, “name”: “Deepstreamtech”, “sameAs”: “https://uk.whatjobs.com/pub_api__cpl__435628897__4861?utm_campaign=publisher&utm_medium=api&utm_source=4861&geoID=33” }, “jobLocation”: { “@type”: “Place”, “address”: { “@type”: “PostalAddress”, “addressLocality”: “London” } } }
Company: Deepstreamtech
Apply for the Staff Software Engineer (Inference Infrastructure)
Location: London
Job Description:

Requirements

  • Are you energized by building high-performance, scalable and reliable machine learning systems? Do you want to help define and build the next generation of AI platforms powering advanced NLP applications?
  • 5+ years of engineering experience running production infrastructure at a large scale
  • Experience designing large, highly available distributed systems with Kubernetes and GPU workloads on those clusters
  • Experience with Kubernetes dev and production coding and support
  • Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving
  • Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
  • Experience in compute/storage/network resource and cost management
  • Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork
  • The grit and adaptability to solve complex technical challenges that evolve day to day
  • Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference
  • Strong understanding or working experience with distributed systems
  • Experience in Golang, C++ or other languages designed for high-performance scalable servers
  • If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply!

What the job involves

  • We are looking for Members of Technical Staff to join the Model Serving team at Cohere
  • The team is responsible for developing, deploying, and operating the AI platform delivering Cohere’s large language models through easy to use API endpoints
  • In this role, you will work closely with many teams to deploy optimized NLP models to production in low latency, high throughput, and high availability environments
  • You will also get the opportunity to interface with customers and create customized deployments to meet their specific needs

#J-18808-Ljbffr…

Posted: May 18th, 2026