PawaPay is a pan-African fintech enabling local payment channels used by the mass market to both local and international merchants. We operate in a highly regulated, partner-driven environment and are in scale-up mode, with ambitious growth plans across multiple African markets.
Our vision is to simplify business for companies and consumers in Africa and we do that by ensuring that payments simply work, at scale, despite fragmentation and complexity that exists on the Continent.
Through our payments API we already facilitate over 4 million transactions a day across 20 countries in Sub-Saharan Africa. We are the largest PSP on the Continent for the processing of mobile money payments, and have positioned ourselves well to lead the payment space as it grows over the next decade.
At PawaPay, there is an entrepreneurial spirit coupled with a modern and professional working culture. The fast-paced, ever-changing environment will suit someone who can adapt and think on their feet. In return, you will have the opportunity to work alongside a group of dedicated and smart individuals working towards the same mission. We work as a remote team and have team members in Europe, UK, Africa and Asia.
What is the role?
As a Site Reliability Engineer at PawaPay, you will own the reliability and scalability of a high-throughput payments platform operating across fragmented and often unreliable external systems.
You will design and operate systems that process millions of transactions daily, ensuring low latency, high availability, and strict consistency — even under failure conditions.
This is not a traditional DevOps role. You will be responsible for ensuring that payments either succeed or fail cleanly, never leaving the system in an inconsistent state, while maintaining performance and resilience at scale.
What makes this role challenging
You will be working on systems where:
- Failure must be atomic — no partial or inconsistent transaction states
- Latency directly impacts conversion and merchant trust
- External dependencies (telcos, banks, mobile money providers) are unreliable and vary by market
- Traffic can spike unpredictably and must be absorbed without degradation
- Observability and incident response are critical to maintaining SLAs
Responsibilities
- Own the reliability, availability, and performance of the production payments platform
- Participate in on-call rotations to ensure system availability
- Define, implement, and continuously improve SLOs, SLAs, and alerting
- Design systems for failure (graceful degradation, retries, idempotency, backoff strategies)
- Lead incident response end-to-end, including postmortems and preventative improvements
- Improve system observability across metrics, logging, and distributed tracing
- Build and maintain scalable infrastructure using infrastructure as code
- Automate operational workflows to reduce manual intervention and increase system resilience
- Collaborate closely with engineering and product teams to ensure reliability is built into system design
Requirements
- 5+ years of experience in Site Reliability Engineering, DevOps, or infrastructure engineering roles
- Strong experience operating production systems with real uptime and reliability requirements
- Experience with distributed systems and understanding of failure modes at scale
- Deep knowledge of AWS (e.g. EKS, networking, IAM, scaling patterns, observability)
- Strong production grade experience with Kubernetes and Helm
- Experience with Terraform or similar infrastructure-as-code tools
- Proficiency in at least one programming language (e.g. Go, Python, Bash)
- Experience with monitoring and observability tooling (metrics, logs)
- Strong problem-solving skills and a proactive, ownership-driven mindset
- Excellent written and verbal communication skills in English
Nice to have:
- Experience in payments, fintech, or other high-availability transaction systems
What success looks like
- Systems remain stable under sudden traffic spikes and partial infrastructure failures
- Incidents are resolved quickly with clear root cause and follow-up improvements
- Strong observability provides clear insight into system behaviour
- SLOs are well-defined, measurable, and consistently met
- Payments either succeed or fail cleanly — never leaving inconsistent states
Why PawaPay?
- Help improve financial access in Africa
- Being part of an amazing team that shapes company’s culture as a great place to be
- An ambitious, talented, and diverse team who always has your back
- We grow fast, and you will grow fast with us
- Competitive remuneration
- 35 days of paid leave per year (inclusive of public holidays) and more.
#J-18808-Ljbffr…
