AWS Site Reliability Engineer

Company: Marks Sattin
Apply for the AWS Site Reliability Engineer
Location: Greater London
Job Description:

An established technology-driven organisation is seeking an experienced Site Reliability Engineer (SRE) in Glasgow to strengthen and scale their cloud-native data platform, utilising AWS, Snowflake, and Databricks. This position offers the opportunity to drive automation, resilience, and operational excellence across critical data services.

Key Responsibilities:

  • Automate infrastructure provisioning and platform operations using Infrastructure as Code and CI/CD tools.
  • Lead and execute reliability initiatives including disaster recovery planning, failure testing, and resilience validation.
  • Define and manage service health metrics (SLIs/SLOs/SLAs) to drive measurable improvements in reliability.
  • Build observability solutions to monitor AWS, Snowflake, and Databricks workloads.
  • Collaborate with engineering teams to embed reliability best practices throughout platform development.
  • Analyse incidents and proactively address root causes to improve availability and performance.
  • Provide operational support, drive incident resolution, and implement automated fixes for recurring issues.

Requirements:

  • Strong knowledge of SRE principles and practical experience defining SLAs, SLOs, and error budgets.
  • Demonstrated AWS expertise (e.g., EC2, S3, IAM, VPC, CloudWatch) in production environments.
  • Experience with observability tools, monitoring, and alerting practices.
  • Proficient in automation, Infrastructure as Code (Terraform, CloudFormation, or CDK), and scripting (Python/Bash).
  • Exposure to Snowflake and/or Databricks data platforms.
  • Background in DR/chaos engineering, CI/CD pipelines, GitOps, or supporting large-scale data environments.

#J-18808-Ljbffr…

Posted: March 4th, 2026