Company: Circadia Health

Apply for the Machine Learning Engineer – Hybrid Remote

Location:

Job Description:

Circadia Health is a growth‑stage healthcare AI company on a mission to prevent avoidable hospitalizations and transform senior‑care operations. Our Circadia Intelligence Platform combines: Contactless sensing that monitors respiration and motion with medical‑grade accuracy Predictive analytics & agentic AI workflows that detect 85 % of preventable rehospitalizations ~11 days in advance Enterprise integrations that embed insights directly into EHR, care‑coordination, billing, and compliance systems

Today our technology touches 40,000+ post‑acute patients daily across skilled‑nursing, home‑health, and home‑care networks. We are backed by leading healthcare and AI investors like Khosla Ventures, Village Global, Headline, Eric Yuan (CEO of Zoom), and others.

As an ML Ops Engineer at Circadia Health , you will own the infrastructure and operational lifecycle of the machine learning systems that power our clinical monitoring platform. You will build and maintain the production ML pipelines, deployment infrastructure, and monitoring systems that enable Circadia’s predictive models to identify early signs of clinical deterioration. Reporting to the Principal ML Engineer, you will work across ML, backend, data, and clinical teams to ensure models are reliably trained, versioned, deployed, and monitored in both cloud and edge environments. You will be a key driver in elevating Circadia’s ML practice – from reproducibility and experiment tracking to CI/CD for models and operational observability. This is a high-ownership role at a lean company where production reliability, rapid iteration, and pragmatic engineering are essential. ML Pipeline Orchestration & Automation Own and extend Circadia’s ML pipeline orchestration using Apache Airflow, including training, evaluation, and deployment workflows. Build and maintain automated pipelines for model retraining, validation, and promotion across development, staging, and production environments. Implement pipeline monitoring, alerting, and failure recovery to eliminate silent failures and ensure operational reliability. Model Deployment & Serving Deploy and manage ML models on AWS infrastructure (e.g. AWS Batch for batch inference workloads). Support deployment of models to edge devices, including Circadia’s clinical monitoring hardware, working with firmware and embedded engineering teams as needed. Manage model versioning, promotion, and rollback workflows through the MLflow model registry. Evaluate and implement strategies for safe model rollouts (e.g. shadow deployments, canary releases) as the platform matures. Enable ML engineers to move seamlessly from experimentation to production deployment with minimal friction. Data & Model Versioning Implement and maintain training data versioning and dataset management practices to ensure reproducibility of model training runs. Collaborate with ML engineers and data engineers to formalise dataset release and validation workflows. Monitoring, Observability & Data Quality Build monitoring systems for model performance in production, including data drift detection, prediction quality tracking, and alerting on degradation. Implement operational dashboards for pipeline health, compute utilisation, and deployment status. Collaborate with data engineering to ensure upstream data quality and pipeline reliability for ML feature inputs. Develop incident response procedures and runbooks for ML system failures. Manage and optimise AWS compute resources (Batch, EC2, or similar) used for model training and inference. Design infrastructure-as-code solutions for reproducible ML environments. Drive cost optimisation across ML compute, storage, and data transfer. Support Snowflake integrations for feature generation and training data pipelines. Elevating ML Practice Introduce and champion ML engineering best practices including CI/CD for models, automated testing for ML pipelines, and reproducible training workflows. Build internal tooling and templates that accelerate the ML development-to-production cycle. Document operational processes, architecture decisions, and onboarding materials for the ML platform. Participate in architecture discussions and technical planning to ensure ML systems scale with Circadia’s growth. Ensure all ML pipelines and infrastructure meet healthcare security and privacy requirements, including HIPAA and SOC 2. Apply best practices for handling Protected Health Information (PHI) in training data, model artifacts, and inference outputs. Maintain audit trails for model decisions, data access, and deployment history.

4+ years of experience in MLOps, ML Engineering, DevOps, or a closely related infrastructure role.~ Strong proficiency in Python for ML pipeline development, tooling, and automation.~ Hands-on experience with ML pipeline orchestration tools, particularly Apache Airflow.~ Experience deploying and operating ML workloads on AWS (Batch, EC2, S3, IAM, CloudWatch).~ Solid understanding of the ML lifecycle: training, evaluation, deployment, monitoring, and retraining.~ Familiarity with SQL and data warehousing platforms (Snowflake preferred).~ Experience implementing monitoring, logging, and alerting for production systems.~ Background in healthcare, medical devices, or clinical data systems. Experience with CI/CD systems for ML (e.g., Experience with data versioning tools (e.g., Experience supporting data science or ML research teams in a production context. Apache Spark, Dask) for large-scale data processing. You take ownership of ML infrastructure end-to-end — from training pipelines to production monitoring. You care deeply about reliability, reproducibility, and operational excellence in ML systems. You have strong opinions (loosely held) on how to build a great ML platform, and you’re eager to put them into practice. You communicate clearly across engineering, data science, and clinical teams. You’re motivated by building technology that directly improves patient care.

Circadia Health is redefining patient monitoring through contactless sensing and AI-driven clinical insights. As we scale from tens of thousands to hundreds of thousands of monitored patients, our data infrastructure is central to everything we do.

Build data systems that power clinical-grade AI and ML…

Posted: June 2nd, 2026