We are looking for a highly skilled Engineer with expertise in Python programming, automation, and modern observability practices to help build and operate scalable distributed systems for an award-winning London Hedge Fund. This role sits at the intersection of platform engineering, AI tooling, and system reliability. You will design automation frameworks, develop AI-assisted engineering tools, and implement observability solutions that provide deep insights into complex distributed architectures.
Responsibilities
- Design, develop, and maintain robust automation solutions using Python.
- Build and maintain observability pipelines including metrics, logs, and traces across distributed systems.
- Develop internal AI-powered tools that enhance engineering productivity and operational intelligence.
- Implement monitoring, alerting, and diagnostics to improve system reliability, performance, and scalability.
- Integrate observability platforms with automation workflows and incident response systems.
- Collaborate with platform, infrastructure, data and development teams to improve system visibility and operational maturity.
- Design tooling that enables proactive detection, analysis, and remediation of system issues across distributed environments.
- Contribute to architecture decisions around telemetry, AI-assisted debugging, and automation frameworks.
- Support business users and stakeholders (direct) with system analysis, problem management, and technical resolution.
Skills & Experience
- Strong professional experience with Python development in production environments.
- Proven experience building automation frameworks, scripts, and developer tooling.
- Strong experience working with distributed systems and large-scale service architectures.
- Hands-on experience working with Kubernetes in production environments.
- Deep understanding of observability practices, including metrics, logs, tracing, and telemetry pipelines.
- Experience integrating AI or machine learning tooling into engineering workflows.
- Strong understanding of APIs, microservices, and containerised environments.
- Experience with CI/CD pipelines and infrastructure automation.
- Ability to design scalable, maintainable engineering tools.
- Experience in supporting business users directly, project or problem coordination with dev and infra teams, project ownership experience.
Interesting Technologies
- Observability: OpenTelemetry, Prometheus, Grafana, Elastic Stack (ELK), Jaeger
- Automation & CI/CD: GitHub Actions, Jenkins, GitLab CI, Argo Workflows
- Distributed Systems & Messaging: Kafka, Redis, gRPC
Offer
- World-class technology environment (award-winning) with best-in-class engineering teams.
- Fast-paced and low-bureaucracy culture – get stuff done mindset.
- Up to £150,000 base salary. 50%-100% annual cash bonus. Pension, Healthcare, Gym, Food, 30 days holiday etc.
- 4 days onsite, 1 day wfh.
- The chance to shape the future of intelligent automation and operational insight in distributed platforms.
#J-18808-Ljbffr…
