Nebius is seeking a Middle AI Solution Architect to drive adoption of our high-performance cloud infrastructure by creating educational content. You’ll join our Nebius Academy team with a mission to help developers succeed with Nebius’ cloud offerings. In this role, you’ll focus on communicating technical capabilities and showcasing differences across Nebius products: cloud infrastructure services and Token Factory (AI Studio).
What you will do
Educational content creation
- Create technical content demonstrating how to effectively use computing workloads with VMs, GPU clusters, k8s, SLURM, Soperator, etc.
- Develop sample code, tutorials, and reference architectures showcasing best practices for cloud computing and ML infrastructure.
- Create video tutorials and live coding sessions demonstrating effective use of Nebius cloud infrastructure.
Helping Academy’s partners build cloud solutions
- Collaborate with academic partners (universities, e.g., Stevens, MIT) to understand their requirements and develop solution architectures that align with their needs: design and document Infrastructure as Code solutions, documentation, and technical how-to guides in collaboration with the Nebius Solutions Architect Team.
- Act as a trusted advisor to our academic partners, providing technical expertise on GPU cloud technologies and best practices.
Requirements
- Strong understanding of cloud infrastructure and distributed computing principles.
- Experience with virtual machines, containerization, and managing compute resources.
- Experience building with IaC solutions, preferably Terraform.
- Knowledge of GPU clusters and techniques for optimizing ML workloads.
- Working knowledge of container orchestration systems like Kubernetes and job schedulers like SLURM.
- Familiarity with infrastructure components including networking, storage optimization, and resource management.
- Experience optimizing performance of diverse workloads in cloud environments.
- Strong programming skills, particularly in Python, and familiarity with the PyTorch ecosystem.
- Understanding of cloud infrastructure concepts and deployment patterns.
- Good written communication skills and ability to clearly express technical ideas in text.
Practical Experience
- 2+ years of experience in software development, cloud engineering, DevOps, or a similar technical role.
- Demonstrated experience with cloud technologies and infrastructure.
- Previous work with infrastructure-as-code, containerization, and cloud environments.
It will be an added bonus if you have
- Experience with MLflow, Apache Airflow, or Kubeflow.
- Familiarity with cloud ML platforms like AWS, GCP, Azure ML, or NVIDIA NGC.
- Experience managing hybrid cloud or on-prem GPU infrastructure.
- Background working with technology partners and integrating third-party solutions.
- Public presentation skills.
What we can offer you
- Fully remote, full-time role with flexible hours to balance work and personal life.
- Paid parental leave and paid sick leave to ensure your well-being.
- A supportive and inclusive team where empathy, respect, and open communication are valued.
- Opportunities for learning, mentorship, and professional growth.
- Competitive compensation with transparent working conditions.
- A suite of thoughtfully chosen collaboration tools: Miro, Notion, Google Workspace.
- At this time, we are unable to offer H-1B, L-1A/B sponsorship opportunities.
- This job description is not designed to contain a comprehensive listing of activities, duties, or responsibilities that are required. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities at any time.
- TripleTen is an equal employment opportunity/affirmative action employer and considers qualified applicants for employment without regard to race, color, religion, sex, national origin, age, religion, disability, marital status, sexual orientation, gender identity/expression, protected military/veteran status, or any other legally protected factor.
#J-18808-Ljbffr…
