Machine Learning Enginer, Core Evaluations

{ “@context”: “http://schema.org”, “@type”: “JobPosting”, “title”: “Machine Learning Enginer, Core Evaluations”, “description”: “

About Cantina

Cantina Labs is a social AI company, developing a suite of advanced real‑time models that push the boundaries of expression, personality, and realism. We bring characters to life, transforming how people tell stories, connect, and create. We build and power ecosystems. Cantina, our flagship social‑AI platform, is just the beginning.

About the Role

We are seeking an experienced Machine Learning Engineer to focus on audio model evaluation, specifically for speech generation and recognition models. This role involves designing and developing comprehensive model evaluation pipelines for both development and production environments, as well as creating automated dashboards for reporting evaluation results. As the founding member of our evaluation team, you will lead our evaluation efforts and shape the future growth of the evaluation team.

What You’ll Do

Design model evaluation pipelines for models in development and production.
Design user studies for subjective model evaluations.
Convert requirements into measurable metrics.
Develop automated evaluation dashboards to visualize model performance and compare results.
Train new models to capture new and different evaluation metrics.
Communicate with the model team to help design better models based on evaluation results.
Communicate with the data team to determine the type of data necessary to improve model performance.
Communicate with the product manager to ensure product requirements are correctly measured.
Help grow the evaluation team as the founding member.
Lead the evaluation team in the future.

What You’ll Bring

Strong experience and intuition for designing metrics that capture model performance.
Strong experience designing user studies on Mechanical Turk or similar platforms.
Experience with model training and fine‑tuning for model evaluation.
Strong statistical knowledge and experience in statistically comparing evaluation results and making decisions.
Very strong engineering and programming skills.
Experience training ASR and TTS models.
Experience at ML teams working on large‑scale machine learning problems (e.g., >3B parameters with >1M hours of data).

#J-18808-Ljbffr”, “datePosted”: “2026-05-02”, “hiringOrganization”: { “@type”: “Organization”, “name”: “Cantina Labs”, “sameAs”: “https://uk.whatjobs.com/pub_api__cpl__420918488__4861?utm_campaign=publisher&utm_medium=api&utm_source=4861&geoID=33” }, “jobLocation”: { “@type”: “Place”, “address”: { “@type”: “PostalAddress”, “addressLocality”: “London” } } }

Company: Cantina Labs

Apply for the Machine Learning Enginer, Core Evaluations

Location: London

Job Description:

About Cantina

About the Role

What You’ll Do

Design model evaluation pipelines for models in development and production.
Design user studies for subjective model evaluations.
Convert requirements into measurable metrics.
Develop automated evaluation dashboards to visualize model performance and compare results.
Train new models to capture new and different evaluation metrics.
Communicate with the model team to help design better models based on evaluation results.
Communicate with the data team to determine the type of data necessary to improve model performance.
Communicate with the product manager to ensure product requirements are correctly measured.
Help grow the evaluation team as the founding member.
Lead the evaluation team in the future.

What You’ll Bring

Strong experience and intuition for designing metrics that capture model performance.
Strong experience designing user studies on Mechanical Turk or similar platforms.
Experience with model training and fine‑tuning for model evaluation.
Strong statistical knowledge and experience in statistically comparing evaluation results and making decisions.
Very strong engineering and programming skills.
Experience training ASR and TTS models.
Experience at ML teams working on large‑scale machine learning problems (e.g., >3B parameters with >1M hours of data).

#J-18808-Ljbffr…

Posted: May 2nd, 2026