Site Reliability Engineer – NS London

{ “@context”: “http://schema.org”, “@type”: “JobPosting”, “title”: “Site Reliability Engineer – NS London”, “description”: “

Location(s): ((mfield3))

Site Reliability Engineering is a rapidly growing concept in industry, with a remit to drive the quality, reliability and performance of essential systems. As a Site Reliability Engineer you will be part of a team in BAE Systems at the forefront of this, delivering benefits to a key national security customer. We are building our team and tools, and will create a culture of continual improvement to revolutionise how our customer’s systems are built and maintained. This role blends operational product support with software engineering to create applications to understand overall system health. The SRE team sits within a wider programme at the core of the customer mission.

Role Holder

As an SRE, you will perform tasks historically done by operations teams, but using software and systems engineering expertise to replace manual labour with automation. The goal is to limit manual operations such as incident tickets and on‑call duties to no more than half of the team's time (and preferably less). You should have enthusiasm to learn and experiment, develop tools for application health, and improve reliability to support the customer mission.

Responsibilities

  • Support and maintain essential services that support core mission applications, proactively enhancing their availability, performance and stability.
  • Participate in the 24/7 on‑call rota, supporting critical production systems out of business hours; additional on‑call allowances and overtime benefits will be paid.
  • Find innovative solutions to problems rather than repeatable work, automating everything possible.
  • Work alongside development teams, advising them on good practices for designing and building systems.
  • Design and deploy monitoring products, creating bespoke tools where required, to provide comprehensive and intelligent observations that meet customer requirements and demonstrate daily improvements.
  • Be well‑versed in the relationship between software and infrastructure, understanding characteristics that enable scalability and resilience.
  • Participate in the wider DevOps/SRE community within the organisation.

Qualifications

  • Software development experience in web technologies and object‑oriented programming.
  • Knowledge of database technologies such as Oracle SQL, MongoDB, PostgreSQL.
  • Comfortable with Linux and Windows command lines (e.g. Bash, PowerShell).
  • Experience monitoring large systems using Grafana, Prometheus, ELK, Splunk.
  • Experience working in Agile teams and related tooling (e.g. Atlassian).
  • Diagnosing and troubleshooting application issues that result in service outages.
  • Cross‑stack troubleshooting skills.
  • Understanding of ITIL.
  • Experience with micro‑services, Docker, and container platforms such as OpenShift, Kubernetes.
  • Awareness of emerging technology trends to adopt cutting‑edge tools.

Security Clearance

Successful candidates must hold an active eDV clearance before applying.

Benefits & Work Environment

We embrace hybrid working, allowing flexibility in location and schedule to balance work and personal life. On‑call allowances and overtime benefits are provided for night shifts.

#J-18808-Ljbffr”, “datePosted”: “2026-05-17”, “hiringOrganization”: { “@type”: “Organization”, “name”: “BAE Systems”, “sameAs”: “https://uk.whatjobs.com/pub_api__cpl__434433054__4861?utm_campaign=publisher&utm_medium=api&utm_source=4861&geoID=33” }, “jobLocation”: { “@type”: “Place”, “address”: { “@type”: “PostalAddress”, “addressLocality”: “London” } } }
Company: BAE Systems
Apply for the Site Reliability Engineer – NS London
Location: London
Job Description:

Location(s): ((mfield3))

Site Reliability Engineering is a rapidly growing concept in industry, with a remit to drive the quality, reliability and performance of essential systems. As a Site Reliability Engineer you will be part of a team in BAE Systems at the forefront of this, delivering benefits to a key national security customer. We are building our team and tools, and will create a culture of continual improvement to revolutionise how our customer’s systems are built and maintained. This role blends operational product support with software engineering to create applications to understand overall system health. The SRE team sits within a wider programme at the core of the customer mission.

Role Holder

As an SRE, you will perform tasks historically done by operations teams, but using software and systems engineering expertise to replace manual labour with automation. The goal is to limit manual operations such as incident tickets and on‑call duties to no more than half of the team’s time (and preferably less). You should have enthusiasm to learn and experiment, develop tools for application health, and improve reliability to support the customer mission.

Responsibilities

  • Support and maintain essential services that support core mission applications, proactively enhancing their availability, performance and stability.
  • Participate in the 24/7 on‑call rota, supporting critical production systems out of business hours; additional on‑call allowances and overtime benefits will be paid.
  • Find innovative solutions to problems rather than repeatable work, automating everything possible.
  • Work alongside development teams, advising them on good practices for designing and building systems.
  • Design and deploy monitoring products, creating bespoke tools where required, to provide comprehensive and intelligent observations that meet customer requirements and demonstrate daily improvements.
  • Be well‑versed in the relationship between software and infrastructure, understanding characteristics that enable scalability and resilience.
  • Participate in the wider DevOps/SRE community within the organisation.

Qualifications

  • Software development experience in web technologies and object‑oriented programming.
  • Knowledge of database technologies such as Oracle SQL, MongoDB, PostgreSQL.
  • Comfortable with Linux and Windows command lines (e.g. Bash, PowerShell).
  • Experience monitoring large systems using Grafana, Prometheus, ELK, Splunk.
  • Experience working in Agile teams and related tooling (e.g. Atlassian).
  • Diagnosing and troubleshooting application issues that result in service outages.
  • Cross‑stack troubleshooting skills.
  • Understanding of ITIL.
  • Experience with micro‑services, Docker, and container platforms such as OpenShift, Kubernetes.
  • Awareness of emerging technology trends to adopt cutting‑edge tools.

Security Clearance

Successful candidates must hold an active eDV clearance before applying.

Benefits & Work Environment

We embrace hybrid working, allowing flexibility in location and schedule to balance work and personal life. On‑call allowances and overtime benefits are provided for night shifts.

#J-18808-Ljbffr…

Posted: May 17th, 2026