Principal Machine Learning Engineer

{ “@context”: “http://schema.org”, “@type”: “JobPosting”, “title”: “Principal Machine Learning Engineer”, “description”: “

Role Summary

We are seeking a Principal Machine Learning Engineer (SageMaker, MLOps, Model Governance & Explainability) to provide technical leadership across the full lifecycle of machine learning systems powering a new matching platform. This role is accountable for defining ML architecture, establishing engineering standards, driving MLOps maturity, and ensuring that our models are scalable, secure, explainable, and governed to enterprise‑grade standards.

You will contribute to the strategic direction of our ML platform—spanning data pipelines, model development, deployment automation, inference runtime design, telemetry, drift detection, and cross‑account productionisation. You will mentor engineers, influence product and architectural decisions, and ensure that our ML systems operate reliably at scale, underpinned by a robust governance and compliance framework.

This is a highly hands‑on, highly technical, principal‑level role that combines architectural vision with deep practical expertise in ML engineering and AWS-native MLOps.

Key Responsibilities

Technical Leadership & Architecture

  • Define the end‑to‑end ML architecture for the matching platform, including data pipelines, model training workflows, inference runtimes, and telemetry ecosystems.
  • Lead adoption of best‑in‑class MLOps patterns, platform tooling, and AWS SageMaker capabilities across training, processing, registry, monitoring, and deployment.
  • Partner with platform, security, and data engineering teams to implement scalable data lakehouse oriented feature architectures and enterprise‑grade ML governance.
  • Champion engineering standards for model quality, documentation, observability, and platform resilience.

Feature Engineering & Data Architecture

  • Architect highly scalable, production‑ready feature pipelines within Lakehouse environments.
  • Set the technical direction for fallback and resilience strategies (e.g., fallback pipelines).
  • Establish and enforce data‑quality guardrails, validation schemas, and monitoring frameworks.
  • Drive adoption and standards for enterprise feature stores.

Model Development & Technical Excellence

  • Lead the design of ranking, scoring, and similarity models tailored to the matching platform requirements.
  • Define model calibration, scoring logic, confidence thresholds, and optimisation strategies.
  • Mentor teams on advanced ML techniques using Model frameworks such as PyTorch, TensorFlow, and XGBoost.
  • Review and approve technical designs for complex modeling workflows.

Explainability & Regulatory-Grade Reasoning

  • Establish explainability standards across the ML stack, using SHAP or equivalent frameworks.
  • Define patterns to generate regulator‑ready reason codes, aligned with compliance requirements.
  • Ensure explainability artefacts are accurate, robust, and traceable across model versions.

ML Deployment & Automation (MLOps)

  • Architect automated training, deployment, and retraining pipelines using AWS SageMaker.
  • Set standards for model registry usage, automated approvals, and rollback orchestration.
  • Drive infrastructure-as-code and CI/CD maturity for ML systems across multiple environments.
  • Lead design of enterprise‑wide weight‑update patterns and lineage‑aware deployment strategies.

Inference Runtime & Cross‑Account Productionisation

  • Architect low‑latency, high‑throughput inference services that meet strict matching platform SLAs.
  • Lead the design of secure cross‑account IAM patterns for model consumption.
  • Own end‑to‑end telemetry design, including scoring metrics, latency, error analytics, and SLOs.
  • Partner with platform teams to optimise cost, scale, and reliability of inference endpoints.

Monitoring, Drift Detection & Observability

  • Define observability standards for feature drift, concept drift, performance degradation, and data integrity.
  • Lead the creation of dashboards, benchmarks, and automated alerting across the ML ecosystem.
  • Ensure telemetry pipelines adhere to privacy, data minimisation, and compliance policies.
  • Drive adoption of proactive failover, shadow-mode testing, and continuous validation patterns.

Security, Compliance & ML Governance

  • Set and enforce ML-specific security standards including data minimisation, encryption, and PII handling.
  • Oversee creation of Model Cards, lineage artefacts, and compliance documentation.
  • Ensure ML systems meet governance standards for auditability, reproducibility, versioning, and traceability.
  • Collaborate with InfoSec and Risk teams to define ML governance frameworks and secure cross‑environment workflows.

Testing, Validation & Performance Engineering

  • Lead validation strategies using golden datasets, behavioural tests, and benchmark suites.
  • Architect performance testing for latency‑sensitive inference paths and model hot paths.
  • Establish standards for A/B testing, shadow deployments, canary rollouts, and controlled experiments.

Company

London Stock Exchange Group

Qualifications

Essential

  • Proven track record architecting and delivering production ML systems at scale in enterprise environments.
  • Deep expertise with AWS SageMaker (training, processing, pipelines, endpoints, registry) and complementary AWS services.
  • Expert‑level Python and ML Model frameworks (e.g. PyTorch, TensorFlow, XGBoost).
  • Strong thought leadership in MLOps automation, CI/CD for ML, and model lifecycle management.
  • Advanced experience designing explainability systems, reason codes, and governance artefacts.
  • Expertise in low‑latency inference architectures and real‑time model serving.
  • Strong grounding in drift detection, telemetry pipelines, observability patterns, and model QA.
  • Experience shaping ML security practices, including cross‑account IAM, data minimisation, and PII-safe design.
  • Ability to influence architecture, mentor senior engineers, and set long‑term technical direction.

Nice to Have

  • Experience building or leading feature store adoption.
  • Background in ranking, search relevance, entity matching, or similarity modelling.
  • Experience designing or governing multi‑account AWS ML platforms.
  • Knowledge of distributed training, GPU/accelerator optimisation, and scaling strategies.
  • Bachelors in a STEM subject, e.g. mathematics, physics, engineering, computer science, or adjacent degrees.
  • Masters or PhD or equivalent experience in STEM desirable but not essential.

#J-18808-Ljbffr”, “datePosted”: “2026-05-20”, “hiringOrganization”: { “@type”: “Organization”, “name”: “NLP PEOPLE”, “sameAs”: “https://uk.whatjobs.com/pub_api__cpl__436788853__4861?utm_campaign=publisher&utm_medium=api&utm_source=4861&geoID=299” }, “jobLocation”: { “@type”: “Place”, “address”: { “@type”: “PostalAddress”, “addressLocality”: “London” } } }
Company: NLP PEOPLE
Apply for the Principal Machine Learning Engineer
Location: London
Job Description:

Role Summary

We are seeking a Principal Machine Learning Engineer (SageMaker, MLOps, Model Governance & Explainability) to provide technical leadership across the full lifecycle of machine learning systems powering a new matching platform. This role is accountable for defining ML architecture, establishing engineering standards, driving MLOps maturity, and ensuring that our models are scalable, secure, explainable, and governed to enterprise‑grade standards.

You will contribute to the strategic direction of our ML platform—spanning data pipelines, model development, deployment automation, inference runtime design, telemetry, drift detection, and cross‑account productionisation. You will mentor engineers, influence product and architectural decisions, and ensure that our ML systems operate reliably at scale, underpinned by a robust governance and compliance framework.

This is a highly hands‑on, highly technical, principal‑level role that combines architectural vision with deep practical expertise in ML engineering and AWS-native MLOps.

Key Responsibilities

Technical Leadership & Architecture

  • Define the end‑to‑end ML architecture for the matching platform, including data pipelines, model training workflows, inference runtimes, and telemetry ecosystems.
  • Lead adoption of best‑in‑class MLOps patterns, platform tooling, and AWS SageMaker capabilities across training, processing, registry, monitoring, and deployment.
  • Partner with platform, security, and data engineering teams to implement scalable data lakehouse oriented feature architectures and enterprise‑grade ML governance.
  • Champion engineering standards for model quality, documentation, observability, and platform resilience.

Feature Engineering & Data Architecture

  • Architect highly scalable, production‑ready feature pipelines within Lakehouse environments.
  • Set the technical direction for fallback and resilience strategies (e.g., fallback pipelines).
  • Establish and enforce data‑quality guardrails, validation schemas, and monitoring frameworks.
  • Drive adoption and standards for enterprise feature stores.

Model Development & Technical Excellence

  • Lead the design of ranking, scoring, and similarity models tailored to the matching platform requirements.
  • Define model calibration, scoring logic, confidence thresholds, and optimisation strategies.
  • Mentor teams on advanced ML techniques using Model frameworks such as PyTorch, TensorFlow, and XGBoost.
  • Review and approve technical designs for complex modeling workflows.

Explainability & Regulatory-Grade Reasoning

  • Establish explainability standards across the ML stack, using SHAP or equivalent frameworks.
  • Define patterns to generate regulator‑ready reason codes, aligned with compliance requirements.
  • Ensure explainability artefacts are accurate, robust, and traceable across model versions.

ML Deployment & Automation (MLOps)

  • Architect automated training, deployment, and retraining pipelines using AWS SageMaker.
  • Set standards for model registry usage, automated approvals, and rollback orchestration.
  • Drive infrastructure-as-code and CI/CD maturity for ML systems across multiple environments.
  • Lead design of enterprise‑wide weight‑update patterns and lineage‑aware deployment strategies.

Inference Runtime & Cross‑Account Productionisation

  • Architect low‑latency, high‑throughput inference services that meet strict matching platform SLAs.
  • Lead the design of secure cross‑account IAM patterns for model consumption.
  • Own end‑to‑end telemetry design, including scoring metrics, latency, error analytics, and SLOs.
  • Partner with platform teams to optimise cost, scale, and reliability of inference endpoints.

Monitoring, Drift Detection & Observability

  • Define observability standards for feature drift, concept drift, performance degradation, and data integrity.
  • Lead the creation of dashboards, benchmarks, and automated alerting across the ML ecosystem.
  • Ensure telemetry pipelines adhere to privacy, data minimisation, and compliance policies.
  • Drive adoption of proactive failover, shadow-mode testing, and continuous validation patterns.

Security, Compliance & ML Governance

  • Set and enforce ML-specific security standards including data minimisation, encryption, and PII handling.
  • Oversee creation of Model Cards, lineage artefacts, and compliance documentation.
  • Ensure ML systems meet governance standards for auditability, reproducibility, versioning, and traceability.
  • Collaborate with InfoSec and Risk teams to define ML governance frameworks and secure cross‑environment workflows.

Testing, Validation & Performance Engineering

  • Lead validation strategies using golden datasets, behavioural tests, and benchmark suites.
  • Architect performance testing for latency‑sensitive inference paths and model hot paths.
  • Establish standards for A/B testing, shadow deployments, canary rollouts, and controlled experiments.

Company

London Stock Exchange Group

Qualifications

Essential

  • Proven track record architecting and delivering production ML systems at scale in enterprise environments.
  • Deep expertise with AWS SageMaker (training, processing, pipelines, endpoints, registry) and complementary AWS services.
  • Expert‑level Python and ML Model frameworks (e.g. PyTorch, TensorFlow, XGBoost).
  • Strong thought leadership in MLOps automation, CI/CD for ML, and model lifecycle management.
  • Advanced experience designing explainability systems, reason codes, and governance artefacts.
  • Expertise in low‑latency inference architectures and real‑time model serving.
  • Strong grounding in drift detection, telemetry pipelines, observability patterns, and model QA.
  • Experience shaping ML security practices, including cross‑account IAM, data minimisation, and PII-safe design.
  • Ability to influence architecture, mentor senior engineers, and set long‑term technical direction.

Nice to Have

  • Experience building or leading feature store adoption.
  • Background in ranking, search relevance, entity matching, or similarity modelling.
  • Experience designing or governing multi‑account AWS ML platforms.
  • Knowledge of distributed training, GPU/accelerator optimisation, and scaling strategies.
  • Bachelors in a STEM subject, e.g. mathematics, physics, engineering, computer science, or adjacent degrees.
  • Masters or PhD or equivalent experience in STEM desirable but not essential.

#J-18808-Ljbffr…

Posted: May 20th, 2026