Job Vision
In an era where computing power is evolving toward full intelligence, AI training and inference are driving a comprehensive reconstruction of the foundational software stack. With the launch of Huawei CloudMatrix super-node clusters, the architectures for AI model training, inference, and agent systems are undergoing unprecedented transformation. Enabled by multi‑chip heterogeneous acceleration, ultra‑high‑speed interconnect, high‑bandwidth memory, and multi‑tier compute pooling, the industry is reaching a critical turning point: AI can no longer rely on traditional cloud resource scheduling and service frameworks. It requires a newly rebuilt, model and intelligence‑centric infrastructure — AI Infra.
Against this backdrop, we have initiated development of a new AI Infra & Agentic Serving architecture. Our goal is to build a unified AI foundation that allows LLM inference, multimodal processing, and agent workflows to operate on CloudMatrix super‑nodes with extreme performance, scalability, and stability. This will become a core enabling capability for future Huawei Cloud, on‑device intelligence, and industry‑scale model platforms.
In this role, you will help define and build:
AI‑native Runtime and Serving framework: redesign inference execution paths, orchestration logic, model lifecycle and data flows to fully exploit super‑node hardware features such as memory pooling and fully interconnected high‑speed fabrics;
Agentic AI core infrastructure: provides base‑level state management, KV‑Cache system, and efficient retrieval capabilities for agent memory, tool calling, workflow execution, and multi‑agent collaboration;
Next‑generation serverless AI: implements Function‑as‑a‑Service (FaaS) inference, near instantaneous scaling, multi‑model hybrid loading, cold start optimization, and peak throughput mode on the unified Runtime;
Performance and cost engineering at ultra scale: deliver industry‑leading performance, latency, and cost efficiency under scenarios such as billion‑level QPS, multi‑model sharing/co‑hosting, and multi‑tenant isolation.
Here, you will face the most critical challenges for AI Infra in the next decade:
How do we design a truly AI‑native execution framework on super‑node hardware;
How can we make agent memory more efficient and support continuously improving reasoning;
How do we achieve optimal trade‑offs under hardware constraints, power budgets, bandwidth bottlenecks, and scheduling limitations;
How do we build the world’s leading AI serving platform?
This is a position that offers end‑to‑end design participation from “Hardware Capabilities → AI Runtime → Agent Systems”. Your work will influence Huawei’s future LLM platform, the capability boundary between industry‑level knowledge workers and agents, and the design of the next‑generation AI Runtime.
Join us, and you will not only be building a system — you will be shaping the foundational standards of the intelligent era. This is one of the few positions where technical insight, systems architecture, and engineering execution converge to the highest level, offering a uniquely creative and impactful stage in your career.
Key Responsibilities:
Architecture Planning: design a unified AI Infra & Serving architecture platform for composite AI workloads such as LLM Training & Inference, RLHF, Agent, and Multimodal processing. This platform will integrate inference, orchestration, and state management, defining the technical evolution path for Serverless AI + Agentic Serving within the company.
Agent Serving Framework: design a heterogeneous execution framework across CPU/GPU/NPU for agent memory, tool invocation, and long‑running multi‑turn conversations and tasks. Build an efficient memory/KV‑cache/vector store/logging and state‑management subsystem to support agent retrieval, planning, and persistent memory.
Serverless AI Foundation Design: build a high‑performance Runtime/Framework that defines the next‑generation Serverless AI foundation through elastic scaling, cold start optimization, batch processing, function‑based inference, request orchestration, dynamic decoupled deployment, and other features to support performance scenarios such as multiple models, multi‑tenancy, and high concurrency.
Performance and Cost Optimization: Leverage Huawei’s self‑developed hardware stack and End‑Edge‑Cloud co‑optimization to deliver AI infrastructure with industry‑leading performance and throughput, ultra‑low latency, and best‑in‑class observability.
Frontier Technology Insights: continuously track cutting‑edge developments in Serverless AI, LLM Serving, and Agentic AI, generate structured insights, and feed them back into architecture evolution and product roadmaps.
Cross‑Team Collaboration: serve as a team leader, working closely with accelerator, operating system, cloud platform, and AI application teams to drive successful deployment of the architecture solutions in real‑world business scenarios.
This job description is only an outline of the tasks, responsibilities and outcomes required of the role. The jobholder will carry out any other duties as may be reasonably required by his/her line manager. The job description and personal specification may be reviewed on an ongoing basis in accordance with the changing needs of Huawei Research and Development UK Limited.
Person Specification:
Required:
Strong foundational knowledge in system architecture: proficient in computer architecture, operating systems, and runtime environments; familiar with large‑scale distributed service architectures and the fundamental principles of storage and networking;
Serverless & Cloud‑Native experience: hands‑on experience with FaaS/Serverless architectures; familiar with cloud‑native optimization technologies such as containers, Kubernetes, service orchestration, and autoscaling;
Expertise in AI Serving: deep familiar with the core mechanisms of mainstream LLM Serving technologies (e.g., vLLM, SGLang, Ray Serve, etc.); understand common optimization concepts such as continuous batching, KV‑Cache reuse, parallelism, and compression/quantization/distillation;
Experience in Agentic AI domains: solid understanding of the basic architecture and typical components of Agentic AI/AI Agents (Memory, Tool/Function Calling, Planner, Executor, Multi‑Agent Collaboration, etc.); clear understanding of memory organization, retrieval, and state management;
Performance analysis and optimization skills: proficient in using profiling/tracing tools; experienced in analyzing and optimizing system‑level bottlenecks regarding GPU utilization, memory/bandwidth, interconnect fabric, and network/storage paths;
Programming and engineering capabilities: proficient in at least one system‑level language (e.g., C/C++, Go, Rust) and one scripting language (e.g., Python); able to maintain high code quality standards and follow engineering best practices;
Communication and architectural articulation skills: able to clearly articulate complex architectural solutions and collaborate across multiple teams and regions to shape future technology choices and roadmap evolution.
Desired:
Experience in production deployment of LLM Serving/Agent/Serverless platforms (e.g., supporting Cloud FaaS platforms, search/recommendation systems, or Agent‑based products), with practical experience in inference acceleration and system co‑optimization for GPUs/heterogeneous hardware.
Significant technical achievements or peer‑reviewed publications in fields related to distributed systems and AI infrastructure. Active contribution or a maintainer role in open‑source communities (such as LangChain, LangGraph, Kubernetes, Ray, vLLM, SGLang, TensorRT‑LLM, etc.).
Experience in leading teams to deliver architectural implementations and drive cross‑functional technical projects.
What we offer
33 days annual leave entitlement per year (including UK public holidays)
Group Personal Pension
Life insurance
Private medical insurance
Medical expense claim scheme
Employee Assistance Program
Cycle to work scheme
Company sports club and social events
Additional time off for learning and development
Job Vision
In an era where computing power is evolving toward full intelligence, AI training and inference are driving a comprehensive reconstruction of the foundational software stack. With the launch of Huawei CloudMatrix super-node clusters, the architectures for AI model training, inference, and agent systems are undergoing unprecedented transformation. Enabled by multi‑chip heterogeneous acceleration, ultra‑high‑speed interconnect, high‑bandwidth memory, and multi‑tier compute pooling, the industry is reaching a critical turning point: AI can no longer rely on traditional cloud resource scheduling and service frameworks. It requires a newly rebuilt, model and intelligence‑centric infrastructure — AI Infra.
Against this backdrop, we have initiated development of a new AI Infra & Agentic Serving architecture. Our goal is to build a unified AI foundation that allows LLM inference, multimodal processing, and agent workflows to operate on CloudMatrix super‑nodes with extreme performance, scalability, and stability. This will become a core enabling capability for future Huawei Cloud, on‑device intelligence, and industry‑scale model platforms.
In this role, you will help define and build:
AI‑native Runtime and Serving framework: redesign inference execution paths, orchestration logic, model lifecycle and data flows to fully exploit super‑node hardware features such as memory pooling and fully interconnected high‑speed fabrics;
Agentic AI core infrastructure: provides base‑level state management, KV‑Cache system, and efficient retrieval capabilities for agent memory, tool calling, workflow execution, and multi‑agent collaboration;
Next‑generation serverless AI: implements Function‑as‑a‑Service (FaaS) inference, near instantaneous scaling, multi‑model hybrid loading, cold start optimization, and peak throughput mode on the unified Runtime;
Performance and cost engineering at ultra scale: deliver industry‑leading performance, latency, and cost efficiency under scenarios such as billion‑level QPS, multi‑model sharing/co‑hosting, and multi‑tenant isolation.
Here, you will face the most critical challenges for AI Infra in the next decade:
How do we design a truly AI‑native execution framework on super‑node hardware;
How can we make agent memory more efficient and support continuously improving reasoning;
How do we achieve optimal trade‑offs under hardware constraints, power budgets, bandwidth bottlenecks, and scheduling limitations;
How do we build the world’s leading AI serving platform?
This is a position that offers end‑to‑end design participation from “Hardware Capabilities → AI Runtime → Agent Systems”. Your work will influence Huawei’s future LLM platform, the capability boundary between industry‑level knowledge workers and agents, and the design of the next‑generation AI Runtime.
Join us, and you will not only be building a system — you will be shaping the foundational standards of the intelligent era. This is one of the few positions where technical insight, systems architecture, and engineering execution converge to the highest level, offering a uniquely creative and impactful stage in your career.
Key Responsibilities:
Architecture Planning: design a unified AI Infra & Serving architecture platform for composite AI workloads such as LLM Training & Inference, RLHF, Agent, and Multimodal processing. This platform will integrate inference, orchestration, and state management, defining the technical evolution path for Serverless AI + Agentic Serving within the company.
Agent Serving Framework: design a heterogeneous execution framework across CPU/GPU/NPU for agent memory, tool invocation, and long‑running multi‑turn conversations and tasks. Build an efficient memory/KV‑cache/vector store/logging and state‑management subsystem to support agent retrieval, planning, and persistent memory.
Serverless AI Foundation Design: build a high‑performance Runtime/Framework that defines the next‑generation Serverless AI foundation through elastic scaling, cold start optimization, batch processing, function‑based inference, request orchestration, dynamic decoupled deployment, and other features to support performance scenarios such as multiple models, multi‑tenancy, and high concurrency.
Performance and Cost Optimization: Leverage Huawei’s self‑developed hardware stack and End‑Edge‑Cloud co‑optimization to deliver AI infrastructure with industry‑leading performance and throughput, ultra‑low latency, and best‑in‑class observability.
Frontier Technology Insights: continuously track cutting‑edge developments in Serverless AI, LLM Serving, and Agentic AI, generate structured insights, and feed them back into architecture evolution and product roadmaps.
Cross‑Team Collaboration: serve as a team leader, working closely with accelerator, operating system, cloud platform, and AI application teams to drive successful deployment of the architecture solutions in real‑world business scenarios.
This job description is only an outline of the tasks, responsibilities and outcomes required of the role. The jobholder will carry out any other duties as may be reasonably required by his/her line manager. The job description and personal specification may be reviewed on an ongoing basis in accordance with the changing needs of Huawei Research and Development UK Limited.
Person Specification:
Required:
Strong foundational knowledge in system architecture: proficient in computer architecture, operating systems, and runtime environments; familiar with large‑scale distributed service architectures and the fundamental principles of storage and networking;
Serverless & Cloud‑Native experience: hands‑on experience with FaaS/Serverless architectures; familiar with cloud‑native optimization technologies such as containers, Kubernetes, service orchestration, and autoscaling;
Expertise in AI Serving: deep familiar with the core mechanisms of mainstream LLM Serving technologies (e.g., vLLM, SGLang, Ray Serve, etc.); understand common optimization concepts such as continuous batching, KV‑Cache reuse, parallelism, and compression/quantization/distillation;
Experience in Agentic AI domains: solid understanding of the basic architecture and typical components of Agentic AI/AI Agents (Memory, Tool/Function Calling, Planner, Executor, Multi‑Agent Collaboration, etc.); clear understanding of memory organization, retrieval, and state management;
Performance analysis and optimization skills: proficient in using profiling/tracing tools; experienced in analyzing and optimizing system‑level bottlenecks regarding GPU utilization, memory/bandwidth, interconnect fabric, and network/storage paths;
Programming and engineering capabilities: proficient in at least one system‑level language (e.g., C/C++, Go, Rust) and one scripting language (e.g., Python); able to maintain high code quality standards and follow engineering best practices;
Communication and architectural articulation skills: able to clearly articulate complex architectural solutions and collaborate across multiple teams and regions to shape future technology choices and roadmap evolution.
Desired:
Experience in production deployment of LLM Serving/Agent/Serverless platforms (e.g., supporting Cloud FaaS platforms, search/recommendation systems, or Agent‑based products), with practical experience in inference acceleration and system co‑optimization for GPUs/heterogeneous hardware.
Significant technical achievements or peer‑reviewed publications in fields related to distributed systems and AI infrastructure. Active contribution or a maintainer role in open‑source communities (such as LangChain, LangGraph, Kubernetes, Ray, vLLM, SGLang, TensorRT‑LLM, etc.).
Experience in leading teams to deliver architectural implementations and drive cross‑functional technical projects.
What we offer
33 days annual leave entitlement per year (including UK public holidays)
Group Personal Pension
Life insurance
Private medical insurance
Medical expense claim scheme
Employee Assistance Program
Cycle to work scheme
Company sports club and social events
Additional time off for learning and development
#J-18808-Ljbffr…
