About the Role
In this role, you will be the primary architect of our Observability Centre of Excellence, directly influencing the reliability and uptime of global platforms that keep world industries moving.
Key Responsibilities
- Lead a global "OTel First" strategy, implementing OpenTelemetry at scale across a diverse technological landscape.
- Spearhead the development of automation scripts and Infrastructure as Code using Terraform to ensure seamless, reproducible platform delivery.
- Optimize platform performance and cost‑efficiency, ensuring our observability tools scale economically as our data grows.
- Collaborate with engineering teams to embed reliability and security standards into new features from the ground up.
- Drive root cause analysis and problem management to proactively prevent incidents and improve the customer experience.
Essential Skills & Experience
- Hands‑on experience with the OpenTelemetry Collector, APIs, and SDKs.
- Extensive experience with observability tools like NewRelic, Datadog, or Splunk.
- Strong proficiency in Infrastructure as Code (Terraform, Ansible) and cloud platforms (AWS, GCP, or Azure).
- Deep understanding of containerization and orchestration using Docker and Kubernetes.
- Advanced coding skills in Python, Go, or Java for building robust automation and monitoring tools.
Bonus Points For
- Experience leveraging AI coding assistants like GitHub Co‑Pilot to accelerate development.
#J-18808-Ljbffr