Lead Site Reliability Engineer Global IT

{ “@context”: “http://schema.org”, “@type”: “JobPosting”, “title”: “Lead Site Reliability Engineer Global IT”, “description”: “

We are a UK fintech creating successful neobanks in emerging markets in partnerships with local traditional banks. The mission is to make banking services accessible, simple and fun to use worldwide and the goal is to launch neobanks in 50+ markets, serving 100m+ customers.

Our success builds upon a best-in-class product, customer experience, emotional engagement, viral marketing and deep credit-decisioning expertise across our product suite covering credit, payments, savings and investments. One of our founders also previously co-founded a highly successful Eastern European neobank with a multi-million customer base.

We launched our first market with Leobank in Azerbaijan in 2021, where we’ve already taken a leading market position. Our next market was Vietnam, where we launched Liobank in early 2023 and have also reached strong traction. We have several more markets on the roadmap in the next 12 months and are starting to build out teams there.

Why Fintech Farm is a Great Place to Be

Fintech Farm is a leading fintech with a clear mission and expansion goals. We are committed to delivering innovative banking solutions worldwide.

Our Ambition

We are looking to become a leading consumer digital bank brand in each market we operate, making it easy for consumers to interact with their money. You could be a part of this exciting journey.

Our Culture

Customers. We always go above and beyond to provide an amazing customer experience. We serve our customers the way we would want our mom to be served. And who said that banking has to be boring? We make our apps not just easy but fun to use.

People. We are all business partners in our company. Each of us thinks big, acts as if we own the place and never takes ‘no’ for an answer. We work with strong individuals whom we empower and trust rather than micromanage. Common sense rather than formal policies prevails in all that we do. We always stay curious and open-minded. We embrace the ‘we over me’ culture.

Your Role

As a Lead SRE, you will drive the reliability, scalability, and performance of our multi-market microservices infrastructure. You’ll lead a team of engineers focused on automating operations, improving observability, and ensuring zero-downtime service delivery across our cloud and on-prem environments. Your mission is to build resilient systems and empower development teams with the tools and practices needed to operate safely and efficiently at scale.

What You Will Be Doing

Build and define theSRE function, establishing best practices for reliability, observability, and incident management across the platform
Manage and optimizeKubernetes clusters(AWS EKS and on-prem), ensuring scalability, cost efficiency, and resilience
Overseeobservability and alerting stack— including Prometheus, Grafana, Alertmanager, ELK, VictoriaMetrics
Implement and refine monitoring and alerting strategies, establishing actionable SLIs/SLOs and effective on-call processes
Drive improvements ininfrastructure as codeusing Terraform/Terragrunt
Collaborate closely withsoftware and DevOps teamsto ensure production readiness and reliable CI/CD delivery pipelines
Participate in and enhanceincident management processes, including post-mortems and continuous improvement initiatives
Lead efforts insecurity hardening, compliance, and cost optimizationacross environments
Contribute tostrategic planningof infrastructure roadmap and technology evolution

Who You Are

A leader who takes ownership and inspires reliability-focused culture
Obsessed with system stability, scalability, and measurable performance
Strong communicator who can translate technical concepts into clear direction
Calm under pressure, analytical in incident response, and proactive in prevention
Passionate about mentoring engineers and driving operational excellence

Your Experience

6+ years in DevOps/SRE roles, with at least 2 years in a technical leadership position
Deep expertise inKubernetes(EKS and on-prem),Prometheus,Grafana, and alerting systems
Strong background inAWSandInfrastructure as Code(Terraform/Terragrunt)
Experience designing and maintainingCI/CD pipelines(GitLab CI/CD or GitHub Actions)
Proficiency in scripting languages (Python, Bash) and automation tooling (Ansible, Helm)
Familiar withGitOps principles(Flux, ArgoCD)
Solid understanding ofnetworking, security, and observabilitypractices
Proven ability to lead incident response and drive cross-functional reliability improvements
Exposure toDevSecOps standards, compliance, and audit processes (ISO 27001, SOC 2, PCI DSS)

What We Are Offering

Competitive salary (negotiable based on seniority and leadership scope)
Share options
Opportunity to shape theSRE functionin a fast-scaling fintech start-up
A collaborative environment that valuesautonomy, innovation, and impact

#J-18808-Ljbffr”, “datePosted”: “2026-05-21”, “hiringOrganization”: { “@type”: “Organization”, “name”: “Fintech Farm Ltd”, “sameAs”: “https://uk.whatjobs.com/pub_api__cpl__438846056__4861?utm_campaign=publisher&utm_medium=api&utm_source=4861&geoID=33” }, “jobLocation”: { “@type”: “Place”, “address”: { “@type”: “PostalAddress”, “addressLocality”: “London” } } }

Company: Fintech Farm Ltd

Apply for the Lead Site Reliability Engineer Global IT

Location: London

Job Description: