Senior Site Reliability Engineer

Company: London Stock Exchange
Apply for the Senior Site Reliability Engineer
Location: Nottingham
Job Description:

Requirements

  • Bachelor’s Degree or equivalent experience in Computer Science, Engineering, or a related field
  • 5+ years of hands‑on technical experience in SRE, Platform Engineering, Infrastructure, or related roles
  • Strong experience with AWS (or Azure), including services such as EKS, ECS, EC2, networking, IAM, and managed services
  • Solid understanding of cloud security principles and experience collaborating with security teams
  • Strong background in Linux systems administrations
  • Proven experience designing and operating observability platforms, including monitoring, logging, and alerting
  • Hands‑on experience with Datadog for metrics, logs, APM, and alerting
  • Strong understanding of SRE principles, including SLOs, error budgets, incident management, and reliability engineering
  • Experience working closely with architecture and engineering teams on system design and delivery
  • Experience with cloud cost optimization strategies and tooling
  • (Desirable) Experience supporting multi‑cloud or hybrid environments
  • (Desirable) Exposure to Infrastructure as Code (e.g., Terraform, CloudFormation)
  • (Desirable) Experience in large‑scale, complex, or regulated environments
  • (Desirable) Knowledge of vector databases and RAG architectures for building internal SRE knowledge assistants
  • (Desirable) Knowledge of Generative AI and LLM platforms (e.g., Claude, Amazon Bedrock)
  • Strong technical authority with the ability to influence design and operational decisions
  • Highly collaborative, comfortable working across architecture, engineering, security, and operations teams
  • Calm and methodical under pressure, especially during incidents and critical issues
  • Pragmatic problem‑solver who balances reliability, security, cost, and delivery speed
  • Clear communicator, able to explain complex technical concepts to diverse audiences

What the job involves

  • We are evolving our Site Reliability Engineering capabilities to strengthen reliability, observability, security, and operational excellence across our Risk Intelligence division
  • As a Senior SRE, you will be a senior hands‑on technical person help shape the foundations of reliability across both new and existing platforms
  • You will collaborate with Architecture, Engineering, Security, and Platform teams to ensure reliability is built into systems from day one
  • While this is not a people‑management you will work closely with global teams and may occasionally be called upon for major incidents or critical issues
  • This position requires a highly proactive, hard‑working expert with strong leadership presence and ownership of platform reliability outcomes
  • We are looking for a person who is passionate about reliability engineering and who bring a continuous improvement approach to everything they do!
  • Lead the establishment of SRE foundations for new projects building environments, monitoring, alerting, and ensuring operational readiness from day one
  • Define, implement, and champion observability standards, tooling, and guidelines across metrics, logs, traces, and SLIs/SLOs
  • Design and evolve monitoring and alerting solutions that improve visibility, reduce toil, and strengthen system health
  • Continuously drive reliability improvements across our environments through incident reduction, performance tuning, and building resilient patterns
  • Partner with Security teams to ensure our platforms meet compliance, security, and risk‑management expectations
  • Influence architectural and design decisions through data‑driven cloud cost optimization and efficiency initiatives
  • Be a technical leader and mentor supporting engineers, shaping engineering standards, and fostering a culture of learning and development

#J-18808-Ljbffr…

Posted: June 1st, 2026