Requirements
- We’re looking for someone who has a solid technical foundation in both machine learning and software engineering
- If you don’t meet every single point below, please still consider applying—what matters most to us is your growth mindset, your ability to learn quickly, and how you work collaboratively
- Experience deploying machine learning models in production environments in cloud platforms like GCP, AWS or Azure
- Experience with CI/CD pipelines for machine learning (e.g., Github action, Docker)
- Experience with ML platforms/frameworks (eg. VertexAI, Kubeflow, Sagemaker)
- Experience with data processing frameworks and tools (eg. Spark, Databricks), particularly Apache Beam/Dataflow is highly desirable
- Knowledge of monitoring and maintaining models in production
- Experience with performance/cost optimization is highly desirable (eg. Latency, throughput)
- Proficiency in Python and relevant ML libraries (e.g., TensorFlow, PyTorch, scikit-learn)
- Problem‑solving skills with the ability to troubleshoot model and pipeline issues
- Strong communication skills, enabling effective collaboration across teams
What the job involves
- As part of the MLOps team, you’ll work closely with data scientists, software engineers, and other stakeholders to bring machine learning models to life—ensuring they’re deployed, maintained, and monitored efficiently in production
- You’ll have the opportunity to improve model performance and infrastructure, all while contributing to Trustpilot’s AI‑driven solutions
- Model Deployment: Collaborate with data scientists to take machine learning models from development to production, ensuring high performance and scalability
- Build Pipelines: Develop and maintain data and model pipelines, integrating seamlessly with our existing systems to support reliable, efficient workflows
- CI/CD for ML: Design and implement continuous integration and delivery pipelines to streamline the deployment of machine learning models
- Model Monitoring: Help monitor the performance of machine learning models post‑deployment, ensuring reliability, scalability, and quality over time
- Collaboration: Work with cross‑functional teams to design solutions that meet business needs while adhering to best practices in machine learning and software engineering
- Optimise: Continuously improve our infrastructure, ensuring we remain at the forefront of AI model production and delivery
- Agentic Development: Develop MCP servers and A2A agents through our internal framework for managing multi‑agent orchestrated deployments
#J-18808-Ljbffr…
