Overview
This role is part of the Research Platforms team in IT Services, supporting the University’s research computing infrastructure including high-performance compute clusters, scalable storage, and cloud-based platforms. The role involves ensuring the reliability, availability and configuration of these platforms to meet the diverse needs of researchers.
Main Duties and Responsibilities
- Deliver, run and support HPC compute platforms that meet the spectrum of research applications.
- Manage high-performance, scalable storage systems for I/O intensive compute tasks.
- Develop platforms that provide secure environments for storing and processing sensitive research data.
- Support the use of cloud-based platforms and technologies by researchers.
- Manage and configure scheduling software and implement policies to allocate resources, including allowing research groups to purchase dedicated access.
- Carry out routine maintenance tasks and identify opportunities to improve and automate them across on-premises and cloud platforms.
- Monitor performance, availability and security of the research platforms, providing regular reports.
- Collaborate with other ITS team members to provide technical guidance and ensure appropriate platform delivery to meet researcher requirements.
- Assist research groups in migrating activities from legacy clusters to appropriate solutions.
- Develop innovative solutions that simplify access to the platform, lowering the barrier for users with limited HPC experience.
- Work with vendors and service providers to procure and maintain infrastructure and services that meet the University’s research computing requirements.
- Perform other duties as required at this grade.
Essential Criteria
- Understanding and experience of HPC and cloud-based research computing platforms, including job schedulers (e.g., Slurm), virtualisation (e.g., VMWare, AWS EC2) and containerisation (e.g., Kubernetes, Docker Swarm, Podman).
- Knowledge of multi-user systems, user account management, authentication and permissions.
- Ability to work within existing admin processes and develop robust automations with future resilience, replicability and management in mind. Knowledge of infrastructure-as-code technologies such as Puppet and CloudFormation is a distinct advantage.
- Effective communication skills, both written and verbal, including report writing and technical documentation.
- Understanding of monitoring and managing physical and virtual computing platforms and infrastructure.
- Understanding of how to use cloud infrastructure and networking to ensure system and data security.
- Ability to assess and organise resources, plan and progress work activities.
- Inquisitive mind and a desire to explore new technologies and engage with other research computing professionals.
Desirable Criteria
- Experience with high-performance storage systems and scalable I/O solutions.
- Experience with procurement and vendor management in a research computing environment.
Additional Information
Grade: G7Line Manager: Research Platforms Engineering LeadDirect reports: None
Employment Conditions
A basic DBS check will be required for this role.We are a Disability Confident Employer.
Benefits
The University offers a competitive annual leave entitlement (including the ability to purchase additional days), a generous pension scheme, flexible working opportunities, commitment to professional development and wellbeing, retail discounts and family-friendly policies including paid time off for parenting and caring emergencies, support for menopause, fertility treatment and more. Full benefit details are available at https://www.sheffield.ac.uk/jobs/benefits.
#J-18808-Ljbffr…
