Requirements
- This role demands a proactive and hands‑on leader with deep technical expertise and strong critical thinking
- Degree educated or equivalent work experience
- Number of years in Production Support / SRE roles with at least 3 years in a leadership capacity
- Deep technical expertise in Oracle database – troubleshooting, scalability, performance tuning and optimization
- Demonstrated experience implementing SRE frameworks – including SLOs, SLIs, incident management, and chaos engineering
- Experience leading teams supporting systems deployed across mixed infrastructure (Cloud and On‑Premise, AWS preferred)
- Solid understanding of change management, risk posture, and production readiness
- Strong track record of delivering automation at scale, reducing toil, and eliminating manual operational tasks
- Excellent communication and stakeholder management skills, particularly under pressure
- Expertise in automation (Python, Shell, PowerShell etc.)
- Familiarity with observability tools and practices (metrics, logging, tracing)
- Ability to lead capacity planning and scalability strategies to support growth
- Knowledge of clearing and settlement processes in financial markets
- Familiarity with regulatory requirements and governance frameworks in financial services
- Demonstrated ability to build, mentor, and retain high‑performing SRE teams
- Good communication and stakeholder management skills under pressure
- Demonstrable experience managing SRE or Production Support teams in a critically important financial services environment
- Experience managing teams located across multiple locations and time zones
- Excellent analytical skills, Attention to detail and problem‑solving abilities
- Solid technical background in the core technologies with several years of experience
- Ability to communicate clearly and concisely to IT and business teams and to senior management
- Ability to break down complex technical issues into easy to digest format
- Familiarity with financial products and terminology
What the job involves
- We are looking for a Manager – Site Reliability Engineering to strengthen the Production Management leadership team of Clearing Technology Service
- You will be responsible for ensuring stability, resilience, and performance of our production systems while driving continuous improvement and SRE best practices across the platform
- Assume end‑to‑end accountability for Clearing production environment, ensuring high availability, optimal performance, and robust resilience of business‑critical systems
- Act as Incident Commander during major incidents, leading resolution efforts, managing stakeholder communications, and driving root cause analysis and remediation
- Build and mentor a high‑performing SRE team. Promote a culture of accountability, continuous improvement, and blameless postmortems to enhance operational excellence
- Ensure consistency to response and resolution SLAs. Oversee efficient ticket management and escalation processes through ServiceNow, removing blockers promptly
- Develop strong partnerships across LCH and LSEG teams. Ensure timely delivery of business‑critical activities and transparent communication of risks and challenges
- Monitor and analyse technical processes to identify improvement opportunities. Implement enhancements to minimise business disruption and improve operational efficiency
- Ensure compliance with regulatory standards and internal governance. Proactively identify and mitigate operational risks
- Establish and maintain robust observability practices, employing metrics, logging, and tracing to drive data‑driven decisions and improve system health
- Out of hours support / On‑call support
- Be available for overnight support of production services to ensure successful completion of processing
- Respond to overnight calls and deal with issues
- Participate in Disaster Recovery exercises
Requirements
- This role demands a proactive and hands‑on leader with deep technical expertise and strong critical thinking
- Degree educated or equivalent work experience
- Number of years in Production Support / SRE roles with at least 3 years in a leadership capacity
- Deep technical expertise in Oracle database – troubleshooting, scalability, performance tuning and optimization
- Demonstrated experience implementing SRE frameworks – including SLOs, SLIs, incident management, and chaos engineering
- Experience leading teams supporting systems deployed across mixed infrastructure (Cloud and On‑Premise, AWS preferred)
- Solid understanding of change management, risk posture, and production readiness
- Strong track record of delivering automation at scale, reducing toil, and eliminating manual operational tasks
- Excellent communication and stakeholder management skills, particularly under pressure
- Expertise in automation (Python, Shell, PowerShell etc.)
- Familiarity with observability tools and practices (metrics, logging, tracing)
- Ability to lead capacity planning and scalability strategies to support growth
- Knowledge of clearing and settlement processes in financial markets
- Familiarity with regulatory requirements and governance frameworks in financial services
- Demonstrated ability to build, mentor, and retain high‑performing SRE teams
- Good communication and stakeholder management skills under pressure
- Demonstrable experience managing SRE or Production Support teams in a critically important financial services environment
- Experience managing teams located across multiple locations and time zones
- Excellent analytical skills, Attention to detail and problem‑solving abilities
- Solid technical background in the core technologies with several years of experience
- Ability to communicate clearly and concisely to IT and business teams and to senior management
- Ability to break down complex technical issues into easy to digest format
- Familiarity with financial products and terminology
What the job involves
- We are looking for a Manager – Site Reliability Engineering to strengthen the Production Management leadership team of Clearing Technology Service
- You will be responsible for ensuring stability, resilience, and performance of our production systems while driving continuous improvement and SRE best practices across the platform
- Assume end‑to‑end accountability for Clearing production environment, ensuring high availability, optimal performance, and robust resilience of business‑critical systems
- Act as Incident Commander during major incidents, leading resolution efforts, managing stakeholder communications, and driving root cause analysis and remediation
- Build and mentor a high‑performing SRE team. Promote a culture of accountability, continuous improvement, and blameless postmortems to enhance operational excellence
- Ensure consistency to response and resolution SLAs. Oversee efficient ticket management and escalation processes through ServiceNow, removing blockers promptly
- Develop strong partnerships across LCH and LSEG teams. Ensure timely delivery of business‑critical activities and transparent communication of risks and challenges
- Monitor and analyse technical processes to identify improvement opportunities. Implement enhancements to minimise business disruption and improve operational efficiency
- Ensure compliance with regulatory standards and internal governance. Proactively identify and mitigate operational risks
- Establish and maintain robust observability practices, employing metrics, logging, and tracing to drive data‑driven decisions and improve system health
- Out of hours support / On‑call support
- Be available for overnight support of production services to ensure successful completion of processing
- Respond to overnight calls and deal with issues
- Participate in Disaster Recovery exercises
#J-18808-Ljbffr…
