Company: London Stock Exchange

Apply for the Principal Machine Learning Engineer

Location: Nottingham

Job Description:

Requirements

Proven track record architecting and delivering production ML systems at scale in enterprise environments
Deep expertise with AWS SageMaker (training, processing, pipelines, endpoints, registry) and complementary AWS services
Expert-level Python and ML Model frameworks (e.g. PyTorch, TensorFlow, XGBoost)
Strong thought leadership in MLOps automation, CI/CD for ML, and model lifecycle management
Advanced experience designing explainability systems, reason codes, and governance artefacts
Expertise in low‑latency inference architectures and real-time model serving
Strong grounding in drift detection, telemetry pipelines, observability patterns, and model QA
Experience shaping ML security practices, including cross‑account IAM, data minimisation, and PII-safe design
Ability to influence architecture, mentor senior engineers, and set long‑term technical direction
(Desirable) Experience building or leading feature store adoption
(Desirable) Background in ranking, search relevance, entity matching, or similarity modelling
(Desirable) Experience designing or governing multi‑account AWS ML platforms
(Desirable) Knowledge of distributed training, GPU/accelerator optimisation, and scaling strategies
(Desirable) Bachelors in a STEM subject, e.g. mathematics, physics, engineering, computer science, or adjacent degrees
(Desirable) Masters or PhD or equivalent experience in STEM desirable but not essential

What the job involves

We are seeking a Principal Machine Learning Engineer (SageMaker, MLOps, Model Governance & Explainability) to provide technical leadership across the full lifecycle of machine learning systems powering a new matching platform
This role is accountable for defining ML architecture, establishing engineering standards, driving MLOps maturity, and ensuring that our models are scalable, secure, explainable, and governed to enterprise‑grade standards
You will contribute to the strategic direction of our ML platform—spanning data pipelines, model development, deployment automation, inference runtime design, telemetry, drift detection, and cross‑account productionisation
You will mentor engineers, influence product and architectural decisions, and ensure that our ML systems operate reliably at scale, underpinned by a robust governance and compliance framework
This is a highly hands‑on, highly technical, principal‑level role that combines architectural vision with deep practical expertise in ML engineering and AWS-native MLOps
Define the end‑to‑end ML architecture for the matching platform, including data pipelines, model training workflows, inference runtimes, and telemetry ecosystems
Lead adoption of best‑in‑class MLOps patterns, platform tooling, and AWS SageMaker capabilities across training, processing, registry, monitoring, and deployment
Partner with platform, security, and data engineering teams to implement scalable data lakehouse oriented feature architectures and enterprise‑grade ML governance
Champion engineering standards for model quality, documentation, observability, and platform resilience
Architect highly scalable, production‑ready feature pipelines within Lakehouse environments
Set the technical direction for fallback and resilience strategies (e.g., fallback pipelines)
Establish and enforce data‑quality guardrails, validation schemas, and monitoring frameworks
Drive adoption and standards for enterprise feature stores
Lead the design of ranking, scoring, and similarity models tailored to the matching platform requirements
Define model calibration, scoring logic, confidence thresholds, and optimisation strategies
Mentor teams on advanced ML techniques using Model frameworks such as PyTorch, TensorFlow, and XGBoost
Review and approve technical designs for complex modeling workflows
Establish explainability standards across the ML stack, using SHAP or equivalent frameworks
Define patterns to generate regulator‑ready reason codes, aligned with compliance requirements
Ensure explainability artefacts are accurate, robust, and traceable across model versions
Architect automated training, deployment, and retraining pipelines using AWS SageMaker
Set standards for model registry usage, automated approvals, and rollback orchestration
Drive infrastructure-as-code and CI/CD maturity for ML systems across multiple environments
Lead design of enterprise‑wide weight‑update patterns and lineage‑aware deployment strategies
Architect low‑latency, high‑throughput inference services that meet strict matching platform SLAs
Lead the design of secure cross‑account IAM patterns for model consumption
Own end‑to‑end telemetry design, including scoring metrics, latency, error analytics, and SLOs
Partner with platform teams to optimise cost, scale, and reliability of inference endpoints
Define observability standards for feature drift, concept drift, performance degradation, and data integrity
Lead the creation of dashboards, benchmarks, and automated alerting across the ML ecosystem
Ensure telemetry pipelines adhere to privacy, data minimisation, and compliance policies
Drive adoption of proactive failover, shadow‑mode testing, and continuous validation patterns
Set and enforce ML‑specific security standards including data minimisation, encryption, and PII handling
Oversee creation of Model Cards, lineage artefacts, and compliance documentation
Ensure ML systems meet governance standards for auditability, reproducibility, versioning, and traceability
Collaborate with InfoSec and Risk teams to define ML governance frameworks and secure cross‑environment workflows
Lead validation strategies using golden datasets, behavioural tests, and benchmark suites
Architect performance testing for latency‑sensitive inference paths and model hot paths
Establish standards for A/B testing, shadow deployments, canary rollouts, and controlled experiments

#J-18808-Ljbffr…

Posted: June 1st, 2026