Lead Site Reliability Engineer

Company: London Stock Exchange
Apply for the Lead Site Reliability Engineer
Location: Nottingham
Job Description:

Requirements

  • This position requires a highly proactive, hard-working expert with strong leadership presence and ownership of platform reliability outcomes
  • We are looking for a person who is passionate about reliability engineering and who bring a continuous improvement approach to everything they do!
  • Bachelor’s Degree in Computer Science or related field
  • 10+ years of hands‑on technical experience in SRE, Platform Engineering, Infrastructure, or related roles
  • Strong experience with AWS, including services such as EKS, ECS, EC2, networking, IAM, and managed services
  • Deep hands‑on experience with Kubernetes and containerised platforms
  • Strong background in Linux systems administrations
  • Proven experience designing and operating observability platforms, including monitoring, logging, and alerting
  • Hands‑on experience with Datadog for metrics, logs, APM, and alerting
  • Strong understanding of SRE principles, including SLOs, error budgets, incident management, and reliability engineering
  • Experience working closely with architecture and engineering teams on system design and delivery
  • Solid understanding of cloud security principles and experience collaborating with security teams
  • Experience with cloud cost optimisation strategies and tooling
  • Hands‑on experience integrating AI with observability stacks (Prometheus, Grafana, ELK, OpenTelemetry) for proactive issue detection
  • (Desirable) Experience or working knowledge of Microsoft Azure
  • (Desirable) Experience supporting multi‑cloud or hybrid environments
  • (Desirable) Exposure to Infrastructure as Code (e.g., Terraform, CloudFormation)
  • (Desirable) Experience in large‑scale, complex, or regulated environments
  • (Desirable) Knowledge of vector databases and RAG architectures for building internal SRE knowledge assistants
  • (Desirable) Knowledge of Generative AI and LLM platforms (e.g., Claude, Amazon Bedrock)
  • Strong technical authority with the ability to influence design and operational decisions
  • Highly collaborative, comfortable working across architecture, engineering, security, and operations teams
  • Calm and methodical under pressure, especially during incidents and critical issues
  • Pragmatic problem‑solver who balances reliability, security, cost, and delivery speed
  • Clear communicator, able to explain complex technical concepts to diverse audiences

What the job involves

  • We are evolving our Site Reliability Engineering capabilities to strengthen reliability, observability, security, and operational excellence across our Markets and Risk Intelligence division
  • As a Technical Lead SRE, you will be a senior hands‑on technical person help shape the foundations of reliability across both new and existing platforms
  • You will collaborate with Architecture, Engineering, Security, and Platform teams to ensure reliability is built into systems from day one
  • While this is not a people‑management or shift‑based role, you will work closely with global teams and may occasionally be called upon for major incidents or critical issues
  • Lead the establishment of SRE foundations for new projects building environments, monitoring, alerting, and ensuring operational readiness from day one
  • Collaborate with Architecture and Engineering teams to embed reliability, scalability, security, and observability into system design
  • Define, implement, and champion observability standards, tooling, and guidelines across metrics, logs, traces, and SLIs/SLOs
  • Design and evolve monitoring and alerting solutions that improve visibility, reduce toil, and strengthen system health
  • Continuously drive reliability improvements across our environments through incident reduction, performance tuning, and building resilient patterns
  • Partner with Security teams to ensure our platforms meet compliance, security, and risk‑management expectations
  • Lead seamless handovers from project delivery into BAU SRE operations by ensuring documentation, readiness, and strong operational practices
  • Influence architectural and design decisions through data‑driven cloud cost optimization and efficiency initiatives
  • Be a technical leader and mentor supporting engineers, shaping engineering standards, and fostering a culture of learning and development

#J-18808-Ljbffr…

Posted: June 6th, 2026