(Senior) Infrastructure Engineer (OpenStack Ironic Specialist)
EMEA; Germany; Netherlands; Norway; Poland; Spain; UK
About Nscale
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.
We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future.
About the Role
We’re hiring an Infrastructure Engineer (OpenStack Ironic Specialist) to design, operate, and continuously improve the bare metal provisioning platforms that underpin Nscale’s infrastructure.
This role sits within the Infrastructure Engineering team, which is responsible for the design, implementation, operation, and ongoing improvement of the infrastructure stack supporting both internal and customer-facing services. You’ll work closely with network, compute, data centre, support, and pre-sales teams, while also serving as a specialist escalation point for advanced provisioning and hardware issues.
This is a high-impact role focused on OpenStack Ironic, automated hardware lifecycle management, and the reliable operation of large-scale physical infrastructure. You’ll also help connect Nscale to the broader upstream OpenStack community, ensuring our bare metal platforms evolve in line with real operational needs and industry direction.
What you’ll be doing
Bare Metal Provisioning & Lifecycle Management
- Design scalable and resilient bare metal provisioning platforms with a strong focus on OpenStack Ironic.
- Own the full lifecycle of physical infrastructure, including discovery, enrolment, provisioning, cleaning, deprovisioning, and hardware state management.
- Build and maintain provisioning workflows for a wide range of hardware profiles, including GPU-enabled and high-performance server platforms.
- Support platform upgrades, lifecycle management, and operational improvements across Ironic and its dependencies.
Automation & Platform Integration
- Manage and improve integrations between Ironic and related OpenStack services such as Nova, Neutron, Glance, Keystone, and Placement.
- Drive automation for hardware onboarding, firmware and BIOS configuration, deployment workflows, validation, and recovery.
- Implement infrastructure automation using infrastructure-as-code and configuration management approaches.
- Ensure provisioning platforms and operational processes align with security, compliance, and operational standards.
Troubleshooting, Reliability & Operational Support
- Troubleshoot complex issues across provisioning pipelines, PXE/iPXE, BMC interfaces, out-of-band management, image deployment, network boot, and hardware compatibility.
- Act as a 3rd/4th line escalation point for advanced bare metal and provisioning incidents.
- Perform root cause analysis and implement long-term fixes to improve platform reliability and repeatability.
- Participate in on-call rotations and incident response activities for critical infrastructure services.
Cross-Functional Collaboration & Community Engagement
- Collaborate with network, compute, data centre, and support teams to deliver reliable physical infrastructure services.
- Contribute specialist input to infrastructure roadmap planning, capacity expansion, standard builds, and hardware platform qualification.
- Support pre-sales and solution design efforts with expert guidance on bare metal capabilities, operational models, and deployment constraints.
- Contribute to upstream OpenStack bare metal communities through bug reports, testing, reviews, design discussions, and code contributions where appropriate.
- Track upstream roadmaps and release changes to help shape Nscale’s bare metal strategy, upgrade planning, and platform standards.
KPIs
- Automated provisioning and hardware onboarding reliability
- Bare metal incident resolution and root cause closure
- Platform upgrade and lifecycle delivery across Ironic dependencies
- Upstream OpenStack Ironic community contribution and adoption alignment
About You
- Strong experience operating Linux systems and troubleshooting production infrastructure
- Strong specialist knowledge of OpenStack Ironic and the surrounding provisioning ecosystem
- Strong understanding of bare metal provisioning concepts including PXE/iPXE, DHCP, TFTP/HTTP boot, BMC technologies, RAID configuration, firmware management, disk imaging, and node lifecycle states
- Strong experience with out-of-band management technologies such as Redfish, IPMI, or vendor management interfaces
- Strong experience designing and building automation for physical and virtual infrastructure using tools such as Ansible
- Strong scripting skills in Python and Bash
- Experience troubleshooting complex provisioning and hardware integration issues across server, network, and management layers
- Experience operating infrastructure at scale with a focus on reliability, repeatability, and operational safety
- Ability to collaborate across infrastructure, support, and architecture teams to solve complex technical problems
- Experience contributing to or working closely with upstream open-source communities, particularly OpenStack, Ironic, Metal3, or related infrastructure projects, is highly desirable
What we can offer you
At Nscale, you’ll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We’re building something extraordinary, and we want you at the core.
Highly competitive US compensation package (base + bonus + equity), with performance reviews every 12 months. …
