Sr. Network Site Reliability Engineer (SREs)
London, United Kingdom | Posted on 09/12/2025
We provide end-to-end IT solutions and services including Applications services, Data & Analytics services, AI/ML Technologies and Professional services in the UK and EU market.
Job Description
Overview
We are seeking a highly experienced Senior Network SRE with deep expertise across multi-vendor network infrastructure, automation, and reliability engineering. The ideal candidate will possess strong technical leadership, hands‑on engineering capabilities, and a passion for building resilient, scalable, and observable network environments.
Key Responsibilities
- Design, implement, and maintain highly available network solutions across routing, switching, firewalling, and wireless technologies.
- Apply SRE principles to improve network reliability, scalability, and performance.
- Develop and maintain automation workflows using Ansible, Salt, and related frameworks to reduce operational toil.
- Build and operate monitoring, alerting, and observability dashboards using tools such as Grafana and Splunk.
- Proactively identify network bottlenecks, performance issues, and reliability risks, implementing long‑term fixes rather than reactive solutions.
- Support incident response, root cause analysis, and post‑incident reviews with a focus on continuous improvement.
- Collaborate with cross‑functional engineering, security, and operations teams to ensure network solutions meet business and technical requirements.
- Contribute to documentation, runbooks, design artifacts, and operational standards.
- Participate in capacity planning, network modernization initiatives, and automation‑first strategies.
Required Skills & Experience
- 10+ years of hands‑on experience in enterprise or service provider network engineering.
- Expertise in multi‑vendor routing, switching, firewalling, and wireless technologies.
- Deep understanding of network protocols (BGP, OSPF, EIGRP, STP, VXLAN, VPNs, QoS, MPLS, etc.).
- Strong experience with infrastructure automation using Ansible and Salt.
- Proficiency with observability tooling such as Grafana, Splunk, or equivalent.
- Solid understanding of SRE practices including SLIs, SLOs, error budgets, and proactive reliability.
- Strong troubleshooting, analytical, and performance optimization skills.
- Excellent communication and collaboration skills, with the ability to influence and guide technical stakeholders.
Nice to Have
- Experience with network programmability (Python, API‑driven networking, NetConf/RESTConf).
- Exposure to cloud networking (AWS, Azure, GCP).
- Knowledge of zero‑trust, SD‑WAN, and network security best practices.
- Experience creating self‑healing or fully automated network workflows.
#J-18808-Ljbffr