Company: PawaPay

Apply for the Site Reliability Engineer

Location: Greater London

Job Description:

PawaPay is a pan-African fintech enabling local payment channels used by the mass market to both local and international merchants. We operate in a highly regulated, partner-driven environment and are in scale-up mode, with ambitious growth plans across multiple African markets.

Our vision is to simplify business for companies and consumers in Africa and we do that by ensuring that payments simply work, at scale, despite fragmentation and complexity that exists on the Continent.

Through our payments API we already facilitate over 4 million transactions a day across 20 countries in Sub-Saharan Africa. We are the largest PSP on the Continent for the processing of mobile money payments, and have positioned ourselves well to lead the payment space as it grows over the next decade.

At PawaPay, there is an entrepreneurial spirit coupled with a modern and professional working culture. The fast-paced, ever-changing environment will suit someone who can adapt and think on their feet. In return, you will have the opportunity to work alongside a group of dedicated and smart individuals working towards the same mission. We work as a remote team and have team members in Europe, UK, Africa and Asia.

What is the role?

As a Site Reliability Engineer at PawaPay, you will own the reliability and scalability of a high-throughput payments platform operating across fragmented and often unreliable external systems.

You will design and operate systems that process millions of transactions daily, ensuring low latency, high availability, and strict consistency — even under failure conditions.

This is not a traditional DevOps role. You will be responsible for ensuring that payments either succeed or fail cleanly, never leaving the system in an inconsistent state, while maintaining performance and resilience at scale.

What makes this role challenging

You will be working on systems where:

Failure must be atomic — no partial or inconsistent transaction states
Latency directly impacts conversion and merchant trust
External dependencies (telcos, banks, mobile money providers) are unreliable and vary by market
Traffic can spike unpredictably and must be absorbed without degradation
Observability and incident response are critical to maintaining SLAs

Responsibilities

Own the reliability, availability, and performance of the production payments platform
Participate in on-call rotations to ensure system availability
Define, implement, and continuously improve SLOs, SLAs, and alerting
Design systems for failure (graceful degradation, retries, idempotency, backoff strategies)
Lead incident response end-to-end, including postmortems and preventative improvements
Improve system observability across metrics, logging, and distributed tracing
Build and maintain scalable infrastructure using infrastructure as code
Automate operational workflows to reduce manual intervention and increase system resilience
Collaborate closely with engineering and product teams to ensure reliability is built into system design

Requirements

5+ years of experience in Site Reliability Engineering, DevOps, or infrastructure engineering roles
Strong experience operating production systems with real uptime and reliability requirements
Experience with distributed systems and understanding of failure modes at scale
Deep knowledge of AWS (e.g. EKS, networking, IAM, scaling patterns, observability)
Strong production grade experience with Kubernetes and Helm
Experience with Terraform or similar infrastructure-as-code tools
Proficiency in at least one programming language (e.g. Go, Python, Bash)
Experience with monitoring and observability tooling (metrics, logs)
Strong problem-solving skills and a proactive, ownership-driven mindset
Excellent written and verbal communication skills in English

Nice to have:

Experience in payments, fintech, or other high-availability transaction systems

What success looks like

Systems remain stable under sudden traffic spikes and partial infrastructure failures
Incidents are resolved quickly with clear root cause and follow-up improvements
Strong observability provides clear insight into system behaviour
SLOs are well-defined, measurable, and consistently met
Payments either succeed or fail cleanly — never leaving inconsistent states

Why PawaPay?

Help improve financial access in Africa
Being part of an amazing team that shapes company’s culture as a great place to be
An ambitious, talented, and diverse team who always has your back
We grow fast, and you will grow fast with us
Competitive remuneration
35 days of paid leave per year (inclusive of public holidays) and more.

#J-18808-Ljbffr…

Posted: March 29th, 2026