The Role
You will be part of a specialist engineering team responsible for designing, building, and optimising end-to-end financial instrument mastering pipelines. These pipelines span ingestion, normalisation, bi-temporal processing, and publication into enterprise data platforms.
You will work closely with data architects, domain experts, and QC engineers to deliver scalable, reliable, and high-performance data solutions across Azure and Microsoft Fabric ecosystems.
Key Responsibilities
- Build and maintain PySpark-based data pipelines for financial instrument mastering across multiple data sources
- Design and implement bi-temporal data processing models (system time + valid time) including Slice, Resolve, Coalesce, and Diff logic
- Develop optimised Azure Cosmos DB data models, including partitioning, indexing, change feed processing, and point-read optimisation
- Integrate external APIs for entity resolution and matching services (PermID / IAAS) with robust retry and batching mechanisms
- Design publication pipelines to convert bi-temporal data into uni-temporal outputs and publish via Microsoft Fabric / Parquet-based lakehouse architectures
- Implement data quality frameworks using Great Expectations to ensure accuracy and compliance
- Build robust unit and integration tests using PyTest for PySpark and Cosmos DB components
- Support and maintain CI/CD pipelines (GitLab CI) including Python packaging, Artifactory deployment, and ARM-based infrastructure provisioning
- Work with YAML-driven configuration for mastering rules, schemas, and environment setup
- Monitor and troubleshoot production pipelines using Eventstream telemetry, KQL, and DataDog observability tools
- Deliver scalable transformation logic, optimised aggregations, and high-performance data processing workflows
- Implement data governance controls including data masking, role-based access, and compliance policies
- Continuously tune and optimise workloads for performance, cost efficiency, and reliability
Required Skills & Experience
- Strong experience in Python and PySpark (Spark SQL, DataFrame API, Structured Streaming)
- Hands-on experience building large-scale ETL / streaming data pipelines
- Experience working with Azure Cosmos DB (NoSQL) including data modelling and performance tuning
- Strong knowledge of Azure Data Lake Storage (ADLS / OneLake / ABFS)
- Experience implementing bi-temporal or SCD Type 2 data models
- Strong understanding of data quality frameworks (e.g., Great Expectations)
- Experience with CI/CD pipelines (GitLab / Azure DevOps) and automated deployments
- Strong testing discipline using PyTest, mocking, and integration testing approaches
- Experience working with YAML/JSON configuration and infrastructure-as-code (ARM templates)
- Strong understanding of distributed data processing and Spark-based architectures
- Experience working with financial or time-series datasets (market data, reference data, risk data preferred)
- Strong communication skills and ability to work with cross-functional stakeholders
Desirable Experience
- Microsoft Fabric (Notebooks, Eventstream, Lakehouses, Spark Job Definitions)
- Financial instrument/reference data (ISIN, CUSIP, LEI, PermID)
- Entity resolution / matching systems and enrichment APIs
- Delta Lake and Change Data Feed (CDF)
- Cosmos DB performance optimisation (RU tuning, bulk operations, concurrency)
- Jinja2 templating or code generation approaches
- SonarQube or similar code quality tooling
- Monorepo development with modern Python packaging tools (uv / Hatchling)
- Power BI / semantic modelling experience
- Knowledge of financial compliance standards (GDPR, SOX)
Technology Stack
Python 3.11+, PySpark 3.5, Spark SQL
Azure Cosmos DB, ADLS, OneLake, Delta Lake, Parquet
Microsoft Fabric (Eventstream, Notebooks, Lakehouse)
Great Expectations, LSEG Data Validation frameworks
GitLab CI/CD, JFrog Artifactory, ARM Templates
DataDog, Eventstream, KQL monitoring
Azure Key Vault, Azure CLI, Fabric APIs
Why Join
- Work on a global financial markets transformation programme
- Hands-on with next-generation Azure + Fabric data platforms
- Exposure to bi-temporal modelling and financial instrument mastering systems
- High-impact engineering role with modern cloud and streaming architecture
- Opportunity to work with leading domain and technical experts in a regulated environment
…
