Cloud Platform Engineer

Company: ebp Global

Location:

Posted: May 14th, 2026

Cloud Platform Engineer (m/f)

 UK |   Full-Time | Remote


Company Description

ebp Global is a high-performing boutique consultancy firm best known for delivering tailored, impactful solutions to our clients’ most complex problems, from conceptualisation to implementation. Our expertise covers a wide range of value chain activities from strategy, organisational design and operating models, through operations and business process optimisation, to information flows and analytics. It is through our hands-on approach, and deep knowledge that we are proud to claim some of the world’s most well-known companies, across a wide variety of industries as long-term client partners.


We are uniquely global, not just operating on a global scale but operating in a global nature, with one another and our clients too. Our team is made up of experts with operational, industry related experience; instilling a true understanding of our client’s problems with a passion to solve and improve.


See https://ebp-global.com/ for further details about our company.


Job Overview

We are seeking a highly skilled and experienced Cloud Platform Engineer with expertise in Azure and AWS to join our dynamic IT team.

The ideal candidate will be responsible for designing, implementing, and managing our cloud architecture and infrastructure, ensuring the highest levels of availability, performance, and security. Overall, you’ll strive for efficiency by aligning cloud systems with business goals.


You are required to work closely with colleagues to effectively gather and translate requirements into solutions. Contribute to the delivery of robust, supportable and sustainable infrastructure solutions in accordance with agreed organisational standards that ensure services are resilient, scalable and future proof.


A self-starter with an inquisitive nature and would want to look beyond the obvious to explore why things are there. Critical and conceptual thinking and problem-solving skills are essential alongside passion for networking.


Job Responsibilities


- Design and implement scalable and secure network architectures in both Azure and AWS environments. -- Develop comprehensive architectural blueprints and documentation for cloud infrastructure.

- Plan and execute cloud migration strategies, including hybrid cloud solutions.

- Design infrastructure for AI/ML workloads including GPU/TPU compute clusters, high-throughput storage, and low-latency networking between nodes

- Architect MLOps pipelines integrating model training, versioning, and deployment workflows on cloud platforms (e.g., Azure ML, AWS SageMaker)


- Deploy and manage virtual networks, subnets, route tables, and network gateways.

- Implement and manage VPN connections, Direct Connect (AWS), and ExpressRoute (Azure).

- Configure and manage load balancers, firewalls, and security groups.

- Oversee DNS setup and management within cloud environments.

- Deploy and manage AI-specific services such as AWS SageMaker, Azure Machine Learning, and GPU-enabled VM fleets

- Set up and manage vector databases (e.g., Pinecone, Weaviate, pgvector on RDS) and object storage optimized for large model artifacts

- Configure container orchestration (Kubernetes/EKS/AKS) for scalable model serving and inference endpoints

- Deploy and manage API hosting environments including containerized REST APIs using Docker and Kubernetes (EKS/AKS)

- Configure and manage API Gateways (AWS API Gateway, Azure API Management) for routing, throttling, and versioning


- Implement and maintain robust security protocols to safeguard cloud infrastructure.

- Conduct regular security audits and compliance checks.

- Ensure cloud infrastructure adheres to industry standards and regulatory requirements.

- Implement data governance and access controls for sensitive training datasets and model artifacts

- Ensure compliance with AI-specific regulations and responsible AI frameworks (e.g., EU AI Act considerations)


- Monitor network performance and implement tuning measures to optimize throughput and latency.

- Troubleshoot and resolve network-related issues promptly.

- Conduct capacity planning and scaling to accommodate growing workloads.

- Optimize inference latency and throughput for deployed models using techniques like auto-scaling endpoints, spot instances, and caching layers

- Monitor GPU utilization, model drift, and endpoint health using tools like CloudWatch, Azure Monitor, or Prometheus


- Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or ARM templates.

- Automate deployment, configuration, and management tasks using scripting languages such as Python, PowerShell, or Bash.

- Build and maintain CI/CD pipelines for model deployment using tools like MLflow, Kubeflow, or Azure DevOps

- Automate model retraining triggers, A/B deployment rollouts, and blue/green model switches

- Experience deploying Python-based REST APIs using frameworks such as FastAPI or Flask

- Build CI/CD pipelines for automated testing, containerization, and deployment of Python APIs to cloud environments


- Support LLM and generative AI deployments including API gateway configuration for models like Azure OpenAI or AWS Bedrock

- Manage prompt caching layers, rate limiting, and cost monitoring for AI API consumption

- Collaborate with data science and AI teams to translate model requirements into scalable cloud infrastructure


- Work closely with development, operations, and security teams to ensure seamless integration and operation of cloud services.

- Provide technical guidance and support to junior network engineers and other team members.

- Participate in on-call rotation for after-hours support as needed.


- Design, deploy, and manage RESTful APIs built in Python (FastAPI, Flask, or Django REST Framework)

- Manage full API lifecycle — versioning, documentation (Swagger/OpenAPI), deprecation, and rollout strategies

- Implement API security best practices including OAuth2, API key management, rate limiting, and JWT authentication

- Monitor API performance, uptime, and error rates using tools like CloudWatch, Azure Monitor, or Datadog

- Manage API monetization or access tiers where applicable, using gateway-level policies


Key Skills for a Cloud Platform Engineer

Education:

Certifications (Preferred):

Technical Skills:

Soft Skills:


Why ebp Global? 


Please apply by sending your CV (in English) to info@ebp-global.com 


Applicants must reside and have the right to work in the UK.

Only short-listed candidates will be contacted. 


Personal data collected will be used for recruitment purpose only. 

Apply Now