Requirements
- Knowledge of Python
- Familiarity with cloud services (e.g. AWS)
- Experience managing or developing in Linux environments
- Understanding of CI/CD principles
- Experience using Kubernetes (k8s)
- (Desirable) Experience maintaining machine learning applications
- (Desirable) Experience deploying ML orchestration tools (e.g. NV Ray, KFP, SkyPilot)
- (Desirable) Experience managing ML accelerator hardware (e.g. DCGM)
- (Desirable) Experience with Infrastructure as Code (IaC) tools (e.g. Terraform/OpenTofu)
- (Desirable) Experience with GitHub Actions
- (Desirable) Experience with modern observability tooling (e.g. Prometheus)
- (Desirable) Experience with Grafana
- (Desirable) Knowledge of Go/Java/C++ (or similar language)
What the job involves
- Join our dynamic Software Infrastructure team and take a pivotal role in scaling and managing our infrastructure
- You will develop essential tools and services that empower our broader software team
- Your contributions will enhance the build, test, deployment, and productisation processes of our Machine Learning Software components
- Work with our High-Performance Computing (HPC) AI platforms and gain invaluable experience in distributed systems
- The Software Infrastructure team provides critical platforms and services for software development teams across the business
- Our responsibilities include managing the CI platform and services, build engineering, component integration, and packaging and release systems
- We operate in squads, fostering a culture of service ownership and empowerment for our engineers
- We focus on long‑term engineering solutions and strive to eliminate toil wherever possible
- Develop, own, and maintain tools and services to support AI research and engineering teams
- Deploy and maintain services with Kubernetes and Docker
- Manage our Cloud Infrastructure using tools such as Terraform
Requirements
- Knowledge of Python
- Familiarity with cloud services (e.g. AWS)
- Experience managing or developing in Linux environments
- Understanding of CI/CD principles
- Experience using Kubernetes (k8s)
- (Desirable) Experience maintaining machine learning applications
- (Desirable) Experience deploying ML orchestration tools (e.g. NV Ray, KFP, SkyPilot)
- (Desirable) Experience managing ML accelerator hardware (e.g. DCGM)
- (Desirable) Experience with Infrastructure as Code (IaC) tools (e.g. Terraform/OpenTofu)
- (Desirable) Experience with GitHub Actions
- (Desirable) Experience with modern observability tooling (e.g. Prometheus)
- (Desirable) Experience with Grafana
- (Desirable) Knowledge of Go/Java/C++ (or similar language)
What the job involves
- Join our dynamic Software Infrastructure team and take a pivotal role in scaling and managing our infrastructure
- You will develop essential tools and services that empower our broader software team
- Your contributions will enhance the build, test, deployment, and productisation processes of our Machine Learning Software components
- Work with our High-Performance Computing (HPC) AI platforms and gain invaluable experience in distributed systems
- The Software Infrastructure team provides critical platforms and services for software development teams across the business
- Our responsibilities include managing the CI platform and services, build engineering, component integration, and packaging and release systems
- We operate in squads, fostering a culture of service ownership and empowerment for our engineers
- We focus on long‑term engineering solutions and strive to eliminate toil wherever possible
- Develop, own, and maintain tools and services to support AI research and engineering teams
- Deploy and maintain services with Kubernetes and Docker
- Manage our Cloud Infrastructure using tools such as Terraform
#J-18808-Ljbffr…
