I am partnering with a leading tech organisation to recruit a Senior Site Reliability Engineer on a day-rate contract for 12 months This is a hands‑on, high‑impact role working closely with engineering teams to drive reliability, scalability, and operational excellence across critical production systems.
What You’ll Do:
- Lead reliability initiatives and own operational performance across core services
- Define and refine SLIs, SLOs, and error budgets aligned with business outcomes
- Drive sophisticated incident management, post‑incident analysis, and remediation planning
- Influence system architecture for high availability, resilience, and multi‑region disaster recovery
- Build automation and CI/CD pipelines, applying safe deployment patterns like canary, blue/green, or progressive delivery
- Develop observability solutions (metrics, logs, traces) and troubleshoot performance bottlenecks
- Mentor engineers and embed SRE best practices across the organisation
- Operate cloud‑native and containerised workloads at scale, leveraging IaC tools to manage resilient platforms
What You Bring:
- 7+ years in site reliability, production, or systems engineering roles
- Hands‑on experience with cloud platforms (AWS, Azure, GCP) and Kubernetes
- Strong programming skills (Python, Go, Java) for automation and tooling
- Proven experience leading high‑severity incidents and delivering systemic improvements
- Deep understanding of distributed systems, fault isolation, and scalability
Bonus Experience:
- Multi‑cloud or multi‑region resilience architecture
- Observability tools (Prometheus, Grafana, Datadog)
- IaC experience (Terraform, CloudFormation)
#J-18808-Ljbffr…
