What You’ll Do
- Design, implement, maintain, and support high performance compute and storage systems
- Implement and support performance monitoring and fault monitoring systems
- Monitor systems and storage performance, up to and including network components
- Build tooling to compile, package, install, and upgrade software and operating system components at scale
- Collaborate with team members and across teams to write code and testing infrastructures spanning both new and existing codebases in multiple programming languages
- Develop and improve systems and user documentation
- Participate in large, coordinated maintenance operations, including during evenings and weekends
- Work on global projects across a wide range of infrastructure
- Collaborate directly with researchers to optimize their use of HPC infrastructure
- Develop and monitor the tools used to maintain a production computing environment
- Provide operational support on a rotating basis and as needed
- Manage relationships with outside vendors, including traveling both domestically and internationally to meet with current and potential vendors
- Adhere to all company cybersecurity and IT policies, including performing all work using only approved hardware and software Other duties as assigned or needed
Skills You’ll Need
- 5+ years of professional experience in high performance computing (HPC), including parallel filesystems (e.g., Lustre, GPFS), batch systems (e.g., Slurm, Grid Engine), and high-performance network interconnects experience is a plus, but not required
- 5+ years of experience with Linux systems administration
- High proficiency with at least one programming/scripting language (e.g., Go, Python, C)
- Extensive experience designing, building, and maintaining complicated, interdependent, and distributed systems
- Extensive experience profiling and debugging application stacks (debuggers and profilers)
- Experience with system configuration management tools (SaltStack, Ansible, Puppet, etc.)
- A compulsion to perform root cause analysis
- Reliable and predictable availability
Benefits
- Private Medical, Vision and Dental Insurance
- Travel Medical Insurance
- Group Pension Scheme
- Group Life Assurance and Income Protection Schemes
- Paid Parental Leave
- Parking and Commuter Benefits
#J-18808-Ljbffr…
