Site Reliability Engineer (SRE / Observability Technical Lead)

{ “@context”: “http://schema.org”, “@type”: “JobPosting”, “title”: “Site Reliability Engineer (SRE / Observability Technical Lead)”, “description”: “

Requirements

The ideal candidate will have deep expertise in Application Performance Monitoring (APM), Infrastructure as Code (IaC), automation, and distributed tracing using OpenTelemetry
5+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
Hands‑on experience with OpenTelemetry (OTel) for distributed tracing and observability instrumentation
Strong proficiency in Infrastructure as Code (IaC) using Terraform
Solid understanding of cloud platforms including AWS, GCP, or Azure
Experience with automation/configuration management tools like Ansible, Chef, or Puppet
Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
Experience managing Kubernetes and containerized environments (Docker, Helm)
Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
Excellent leadership, communication, and collaboration skills

What the job involves

We are seeking an experienced Site Reliability Engineer (SRE) / Observability Technical Lead to join our team and drive the strategy and execution of observability and reliability projects across our clients
As a lead, you will guide the design, implementation, and continuous improvement of observability solutions, ensuring system reliability, performance, and scalability while fostering best practices in SRE and DevOps
Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
Collaborate with product management, sales, and pre‑sales teams to provide technical expertise and support during solution design and customer engagements
Lead cross‑functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence

#J-18808-Ljbffr”, “datePosted”: “2026-05-20”, “hiringOrganization”: { “@type”: “Organization”, “name”: “Deepstreamtech”, “sameAs”: “https://uk.whatjobs.com/pub_api__cpl__436979187__4861?utm_campaign=publisher&utm_medium=api&utm_source=4861&geoID=33” }, “jobLocation”: { “@type”: “Place”, “address”: { “@type”: “PostalAddress”, “addressLocality”: “London” } } }

Company: Deepstreamtech

Apply for the Site Reliability Engineer (SRE / Observability Technical Lead)

Location: London

Job Description:

Requirements

The ideal candidate will have deep expertise in Application Performance Monitoring (APM), Infrastructure as Code (IaC), automation, and distributed tracing using OpenTelemetry
5+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities
Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace
Hands‑on experience with OpenTelemetry (OTel) for distributed tracing and observability instrumentation
Strong proficiency in Infrastructure as Code (IaC) using Terraform
Solid understanding of cloud platforms including AWS, GCP, or Azure
Experience with automation/configuration management tools like Ansible, Chef, or Puppet
Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps
Experience managing Kubernetes and containerized environments (Docker, Helm)
Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk
Excellent leadership, communication, and collaboration skills

What the job involves

We are seeking an experienced Site Reliability Engineer (SRE) / Observability Technical Lead to join our team and drive the strategy and execution of observability and reliability projects across our clients
As a lead, you will guide the design, implementation, and continuous improvement of observability solutions, ensuring system reliability, performance, and scalability while fostering best practices in SRE and DevOps
Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements
Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices
Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments
Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency
Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies
Collaborate with product management, sales, and pre‑sales teams to provide technical expertise and support during solution design and customer engagements
Lead cross‑functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices
Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships
Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence

#J-18808-Ljbffr…

Posted: May 20th, 2026