Company: Pharmacy2U | Certified B Corp

Apply for the Technology – ML Ops Engineer

Location: Leeds

Job Description:

Role

Role: ML Ops Engineer

Location & Working Arrangements

Location: Hybrid schedule; 2-3 days a week in the office at Thorpe Park, Leeds.

Working hours: Core hours 09:30 – 16:00; you can work around these to suit you.

Salary & Contract

Salary: £ DOE plus extensive benefits

Contract type: Permanent

Employment type: Full time

Our tech teams keep us running 24/7 to ensure world‑class service for our patients. This role may include participation in an out‑of‑hours rota as required by the business, with a fair scheduling process and additional compensation for on‑call periods.

About Us

We are the nation’s largest online pharmacy, with 25 years of experience, helping over 1.8 million patients in England manage NHS prescriptions from request through to delivery. We are Great Place to Work certified and a certified B Corp, reflecting high standards of social and environmental responsibility. Our people are fundamental to our success as we strive to be a world leading, patient‑centric digital healthcare provider and to maintain a positive, open and honest working environment.

Role Overview

The ML Ops Engineer will drive the operation of production‑grade Machine Learning and LLM services on Azure, ensuring models run as reliable, scalable, high‑performing systems. You will own the end‑to‑end MLOps/LLMOps lifecycle, leading CI/CD, deployment automation, monitoring, and incident response. You will work closely with Data Science to turn models into robust production services with governance, observability, and continuous optimisation for fast, safe, and efficient delivery at scale.

What you’ll be doing

Production Deployment & Release Engineering

Design and operate CI/CD pipelines for ML models and LLM prompt‑flows, covering build, test, validation, deployment, and rollback
Own model registration and promotion across environments, ensuring traceability, governance, and auditability
Implement safe deployment strategies (blue/green, canary, champion/challenger)
Package and deploy containerised inference services and batch pipelines, ensuring repeatability and rapid rollback

Reliability Engineering (Day 2 Operations)

Run ML and LLM services as production‑grade systems, defining SLOs/SLIs, dashboards, and alerting
Lead incident response for runtime issues, including triage, mitigation, recovery, and post‑incident reviews
Develop and maintain operational runbooks covering restart, rollback, secret rotation, and safe‑mode scenarios
Improve service resilience and reduce MTTR through automation (self‑healing, retries, fallbacks, circuit breakers)

Observability (Service, Data, Model & Cost)

Implement monitoring for availability, latency, errors, resource usage, and job performance
Monitor data quality including freshness, volume, completeness, schema drift, and distribution changes
Monitor model performance, including drift and prediction distribution shifts, and track accuracy where labels exist
Instrument LLM services for token usage, latency, and safety signals, with clear visibility into cost, quotas, and risks

LLMOps: Lifecycle, Quality & Safety

Manage prompts and workflows as code, including versioning, code reviews, and automated regression testing
Own production configuration for LLM deployments, including model updates, limits, and safeguards
Partner with Data Science and Security to ensure robust safety practices, including PII protection and prompt‑injection testing

Security, Privacy & Governance

Implement secure access controls, identity management, and secrets handling
Support production readiness through documentation, monitoring plans, cost models, and audit evidence
Ensure all changes follow structured governance with clear traceability and reproducibility

Who we’re looking for

Strong Python engineering skills with experience in ML frameworks (scikit‑learn, PyTorch, TensorFlow) and experiment tracking
Comfortable in regulated environments with privacy, auditability, change control, and handling sensitive data
Strong DevOps/SRE background: CI/CD, Infrastructure as Code, monitoring and alerting, incident management, reliability engineering
Hands‑on experience with Docker and Kubernetes (e.g., AKS), including debugging and performance tuning
Experience with Azure, including Azure Machine Learning (pipelines, registries, endpoints) and Azure Monitor or Log Analytics
Experience operationalising ML pipelines (training, batch scoring, feature engineering) and preventing training‑serving skew
Experience implementing safe deployment practices (blue/green or canary) with automated validation
Understanding of data contracts, schema evolution, and data quality practices, troubleshooting data drift and missing features

What happens next

Please click apply. If we think you are a good match, we will be in touch to arrange an interview. Applicants must prove they have the right to live in the UK. All successful applicants will be required to undergo a DBS check. Unsolicited agency applications will be treated as a gift.

#J-18808-Ljbffr…

Posted: June 6th, 2026