We’re seeking a IT Service Performance & Reliability Manager to take ownership of performance, capacity, and resilience across critical IT services. This role focuses on keeping customer-facing services fast, reliable, and fully observable, while driving continuous improvement.
Please read the following job description thoroughly to ensure you are the right fit for this role before applying.
You will lead observability across services, ensuring effective monitoring and actionable insights. You’ll manage capacity and performance through forecasting and trend analysis, identifying risks early and driving improvements. Ensure resilience and availability are built into services from the outset, while supporting continuity planning and risk management. Working closely with technical teams and stakeholders, you’ll help resolve issues and deliver ongoing service improvements.
Key Requirements
Experience managing capacity and performance in IT environmentsHands-on experience with AWS and AzureStrong knowledge of ITIL v3/v4 (certification required)Experience with monitoring/observability tools (e.g. Zabbix, Grafana, Kibana, OpenSearch)Knowledge of Windows and Linux server environmentsScripting skills (e.g. Python, PowerShell, Node. xwzovoh js)Experience integrating data via APIs, webhooks, or messagingStrong analytical, problem-solving, and stakeholder management skillsDesirable:
DevOps exposureNetwork infrastructure and communications protocols knowledgeExperience with social alarm platformsIf you’re looking for a role where you can make a tangible impact on service performance and resilience, we encourage you to apply.
Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy…
