Production Engineer

Company: Radley James
Apply for the Production Engineer
Location: Greater London
Job Description:

A global trading business is looking for a talented Trading Production Engineer to join there high-performing technology team in New York. In this role, you will be responsible for ensuring the stability, performance, and reliability of mission-critical trading systems operating in a fast-paced, real-time environment.

You will work closely with traders, developers, and infrastructure teams to support and enhance systems that demand extremely high availability and low latency.

Key Responsibilities

  • Maintain and support production trading systems with a focus on uptime, resilience, and performance
  • Monitor system health, respond to incidents, and perform root cause analysis
  • Collaborate with development teams to improve system reliability and release processes
  • Automate operational tasks and build tools to enhance system observability
  • Manage deployments, releases, and change processes in production environments
  • Optimise system performance, including latency and throughput improvements
  • Implement and maintain monitoring, alerting, and logging solutions
  • Participate in on-call rotation and provide out-of-hours support when required

Required Skills & Experience

  • Proven experience in a Production Engineer, Site Reliability Engineer, or similar role
  • Strong Linux/Unix systems knowledge
  • Proficiency in Python
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, ELK stack)
  • Familiarity with CI/CD pipelines and deployment tooling
  • Solid understanding of networking concepts (TCP/IP, DNS, load balancing)
  • Strong troubleshooting skills in complex, distributed systems
  • Ability to work effectively under pressure in a fast-paced environment
  • Experience in financial services, trading, or low-latency environments
  • Knowledge of high-frequency trading systems or real-time data pipelines
  • Experience with containerisation and orchestration (Docker, Kubernetes)
  • Understanding of incident management and post-mortem practices

#J-18808-Ljbffr…

Posted: April 4th, 2026