Senior / Lead Observability & Cloud Infrastructure Engineer
We are seeking an experienced Senior / Lead Observability & Cloud Infrastructure Engineer to join a large-scale digital transformation programme. The successful candidate will play a key role in designing, implementing and enhancing observability capabilities across modern cloud‑native platforms, with a particular focus on Dynatrace.
This position requires a strong blend of hands‑on observability expertise, AWS infrastructure knowledge, and experience supporting distributed microservice‑based applications running in containerised environments.
Key Responsibilities
- Lead the design, implementation and optimisation of Dynatrace monitoring solutions across complex cloud environments.
- Configure and maintain dashboards, alerting frameworks and end‑to‑end observability for customer‑facing digital services.
- Implement Dynatrace instrumentation and monitoring across cloud infrastructure, APIs, microservices, containers and databases.
- Work closely with engineering, platform and operations teams to improve service visibility and operational resilience.
- Analyse and troubleshoot performance, availability and reliability issues across distributed systems.
- Support the adoption of observability best practices and drive continuous improvement initiatives.
- Design and implement proactive alerting strategies to reduce incident impact and improve service reliability.
- Document monitoring architectures, operational procedures and technical solutions.
Required Experience
- Strong hands‑on experience implementing and administering Dynatrace in enterprise‑scale environments.
- Experience deploying and configuring Dynatrace monitoring, dashboarding, alerting and integrations.
- Strong AWS cloud experience including services such as:
- EC2
- ECS
- EKS
- Lambda
- S3
- RDS
- IAM
- VPC
- CloudFormation
- Strong understanding of cloud‑native and microservice‑based architectures.
- Experience working with container technologies including Docker, ECS and/or Kubernetes.
- Strong troubleshooting and root cause analysis skills within distributed environments.
- Experience with monitoring and observability tooling such as Dynatrace, CloudWatch and related platforms.
- Knowledge of Infrastructure as Code and automation tooling including CloudFormation and/or Terraform.
- Experience working within DevOps, Platform Engineering or Site Reliability Engineering environments.
- Experience within large‑scale enterprise or consultancy‑led environments.
- Knowledge of CI/CD pipelines and deployment automation.
- Experience defining service‑level objectives (SLOs), KPIs and operational metrics.
- Exposure to additional observability or APM platforms such as Datadog, AppDynamics, New Relic or Splunk.
#J-18808-Ljbffr…
