Systems Administrator

Company: Alexander Ash Consulting
Apply for the Systems Administrator
Location: Greater London
Job Description:

Salary: Competitive + package (depending on experience)

Type: Full-time

A leading consulting and technology organisation is looking to hire a number of HPC Systems Administrator / Consultant to join a growing High Performance Compute operations team supporting next-generation AI infrastructure projects across the UK.

This role will focus on the design, deployment and operation of high-density compute environments, supporting advanced GPU clusters and AI model training platforms. The successful candidate will work with cutting-edge compute stacks and play a key role in enabling high-performance AI workloads. Due to the nature of the work, this role will involve secure and sensitive environments.

Key Responsibilities

  • Design, deploy and manage HPC infrastructures, including GPU clusters and parallel computing environments
  • Support AI model training platforms by maintaining compute resources and optimising workload scheduling
  • Monitor, analyse and optimise system performance, identifying bottlenecks and improving efficiency
  • Develop and maintain automation scripts and operational tooling (Python, PowerShell, Bash)
  • Maintain clear documentation covering architecture, configurations, operational procedures and incident resolution
  • Support incident management processes, including root cause analysis and post-incident reviews
  • Work closely with cross-functional teams to ensure reliability, performance and security across HPC environments

Required Experience

  • Strong experience working in High Performance Computing (HPC) environments
  • Experience managing GPU clusters (e.g. NVIDIA or AMD)
  • Familiarity with workload schedulers such as SLURM or PBS
  • Experience supporting AI/ML model training frameworks such as TensorFlow, PyTorch or CUDA
  • Solid understanding of Linux and Windows server environments, networking and storage platforms
  • Strong troubleshooting and performance optimisation skills within compute-heavy environments
  • Experience with automation, scripting and monitoring tools (Python, PowerShell, Bash)
  • Excellent communication skills and ability to work with both technical and non-technical stakeholders

#J-18808-Ljbffr…

Posted: April 10th, 2026