Company: Marks Sattin

Apply for the AWS Site Reliability Engineer

Location: Greater London

Job Description:

An established technology-driven organisation is seeking an experienced Site Reliability Engineer (SRE) in Glasgow to strengthen and scale their cloud-native data platform, utilising AWS, Snowflake, and Databricks. This position offers the opportunity to drive automation, resilience, and operational excellence across critical data services.

Key Responsibilities:

Automate infrastructure provisioning and platform operations using Infrastructure as Code and CI/CD tools.
Lead and execute reliability initiatives including disaster recovery planning, failure testing, and resilience validation.
Define and manage service health metrics (SLIs/SLOs/SLAs) to drive measurable improvements in reliability.
Build observability solutions to monitor AWS, Snowflake, and Databricks workloads.
Collaborate with engineering teams to embed reliability best practices throughout platform development.
Analyse incidents and proactively address root causes to improve availability and performance.
Provide operational support, drive incident resolution, and implement automated fixes for recurring issues.

Requirements:

Strong knowledge of SRE principles and practical experience defining SLAs, SLOs, and error budgets.
Demonstrated AWS expertise (e.g., EC2, S3, IAM, VPC, CloudWatch) in production environments.
Experience with observability tools, monitoring, and alerting practices.
Proficient in automation, Infrastructure as Code (Terraform, CloudFormation, or CDK), and scripting (Python/Bash).
Exposure to Snowflake and/or Databricks data platforms.
Background in DR/chaos engineering, CI/CD pipelines, GitOps, or supporting large-scale data environments.

#J-18808-Ljbffr…

Posted: March 4th, 2026

Latest Job Pages: