Principal Machine Learning Engineer

Company: London Stock Exchange
Apply for the Principal Machine Learning Engineer
Location: Nottingham
Job Description:

Requirements

  • Proven track record architecting and delivering production ML systems at scale in enterprise environments
  • Deep expertise with AWS SageMaker (training, processing, pipelines, endpoints, registry) and complementary AWS services
  • Expert-level Python and ML Model frameworks (e.g. PyTorch, TensorFlow, XGBoost)
  • Strong thought leadership in MLOps automation, CI/CD for ML, and model lifecycle management
  • Advanced experience designing explainability systems, reason codes, and governance artefacts
  • Expertise in low‑latency inference architectures and real-time model serving
  • Strong grounding in drift detection, telemetry pipelines, observability patterns, and model QA
  • Experience shaping ML security practices, including cross‑account IAM, data minimisation, and PII-safe design
  • Ability to influence architecture, mentor senior engineers, and set long‑term technical direction
  • (Desirable) Experience building or leading feature store adoption
  • (Desirable) Background in ranking, search relevance, entity matching, or similarity modelling
  • (Desirable) Experience designing or governing multi‑account AWS ML platforms
  • (Desirable) Knowledge of distributed training, GPU/accelerator optimisation, and scaling strategies
  • (Desirable) Bachelors in a STEM subject, e.g. mathematics, physics, engineering, computer science, or adjacent degrees
  • (Desirable) Masters or PhD or equivalent experience in STEM desirable but not essential

What the job involves

  • We are seeking a Principal Machine Learning Engineer (SageMaker, MLOps, Model Governance & Explainability) to provide technical leadership across the full lifecycle of machine learning systems powering a new matching platform
  • This role is accountable for defining ML architecture, establishing engineering standards, driving MLOps maturity, and ensuring that our models are scalable, secure, explainable, and governed to enterprise‑grade standards
  • You will contribute to the strategic direction of our ML platform—spanning data pipelines, model development, deployment automation, inference runtime design, telemetry, drift detection, and cross‑account productionisation
  • You will mentor engineers, influence product and architectural decisions, and ensure that our ML systems operate reliably at scale, underpinned by a robust governance and compliance framework
  • This is a highly hands‑on, highly technical, principal‑level role that combines architectural vision with deep practical expertise in ML engineering and AWS-native MLOps
  • Define the end‑to‑end ML architecture for the matching platform, including data pipelines, model training workflows, inference runtimes, and telemetry ecosystems
  • Lead adoption of best‑in‑class MLOps patterns, platform tooling, and AWS SageMaker capabilities across training, processing, registry, monitoring, and deployment
  • Partner with platform, security, and data engineering teams to implement scalable data lakehouse oriented feature architectures and enterprise‑grade ML governance
  • Champion engineering standards for model quality, documentation, observability, and platform resilience
  • Architect highly scalable, production‑ready feature pipelines within Lakehouse environments
  • Set the technical direction for fallback and resilience strategies (e.g., fallback pipelines)
  • Establish and enforce data‑quality guardrails, validation schemas, and monitoring frameworks
  • Drive adoption and standards for enterprise feature stores
  • Lead the design of ranking, scoring, and similarity models tailored to the matching platform requirements
  • Define model calibration, scoring logic, confidence thresholds, and optimisation strategies
  • Mentor teams on advanced ML techniques using Model frameworks such as PyTorch, TensorFlow, and XGBoost
  • Review and approve technical designs for complex modeling workflows
  • Establish explainability standards across the ML stack, using SHAP or equivalent frameworks
  • Define patterns to generate regulator‑ready reason codes, aligned with compliance requirements
  • Ensure explainability artefacts are accurate, robust, and traceable across model versions
  • Architect automated training, deployment, and retraining pipelines using AWS SageMaker
  • Set standards for model registry usage, automated approvals, and rollback orchestration
  • Drive infrastructure-as-code and CI/CD maturity for ML systems across multiple environments
  • Lead design of enterprise‑wide weight‑update patterns and lineage‑aware deployment strategies
  • Architect low‑latency, high‑throughput inference services that meet strict matching platform SLAs
  • Lead the design of secure cross‑account IAM patterns for model consumption
  • Own end‑to‑end telemetry design, including scoring metrics, latency, error analytics, and SLOs
  • Partner with platform teams to optimise cost, scale, and reliability of inference endpoints
  • Define observability standards for feature drift, concept drift, performance degradation, and data integrity
  • Lead the creation of dashboards, benchmarks, and automated alerting across the ML ecosystem
  • Ensure telemetry pipelines adhere to privacy, data minimisation, and compliance policies
  • Drive adoption of proactive failover, shadow‑mode testing, and continuous validation patterns
  • Set and enforce ML‑specific security standards including data minimisation, encryption, and PII handling
  • Oversee creation of Model Cards, lineage artefacts, and compliance documentation
  • Ensure ML systems meet governance standards for auditability, reproducibility, versioning, and traceability
  • Collaborate with InfoSec and Risk teams to define ML governance frameworks and secure cross‑environment workflows
  • Lead validation strategies using golden datasets, behavioural tests, and benchmark suites
  • Architect performance testing for latency‑sensitive inference paths and model hot paths
  • Establish standards for A/B testing, shadow deployments, canary rollouts, and controlled experiments

#J-18808-Ljbffr…

Posted: June 1st, 2026