Role Description
- Responsible for building production-ready backend services for agentic workflowcomponents aligned to solution architecture and platform standards.
- Implement solution designs by building Python services, worker processes, andreusable libraries following defined architecture, patterns, and standards.
- Develop agentic workflow components: tool connectors, orchestration steps, statemanagement modules, retrieval components, and approval/escalation flows.
- Build reliable LLM interaction layers: tool/function calling, schema-validatedstructured outputs, guardrails, safe tool execution boundaries, and fallbackbehaviors.
- Implement robust backend patterns: async execution, job queues,retries/idempotency, compensating actions, and failure isolation for long-runningworkflows.
- Deliver production readiness: logging/tracing, metrics, decision logs, run replaysupport, performance profiling, and cost/latency controls.
- Write clean, maintainable, testable code with strong review discipline:unit/integration tests and regression testing for prompts/agents where applicable.
- Collaborate closely with the Agentic AI Architect and technical leads; supportdelivery across DEV/UAT/PROD including defect triage and operational support.
Location
- The role supports one of our top-tier banking clients in London (Canary Wharf) andrequires a minimum of three days on-site presence.
- This is a permanent position based in the UK. We will only consider applicants whoare eligible to work in the UK. For this role do NOT offer visa sponsorship.
Experience Requirements & Qualifications
Core Experience
- 4+ years in Python backend development, including building productionAPIs/services and/or worker-based processing systems.
- Demonstrable experience in implementing Generative AI, AI/LLM-enabled featuresor systems (agentic workflows, RAG, tool calling, evaluation/monitoring) is preferred.
- Strong capability in backend fundamentals: service boundaries, API contracts, asyncexecution, retries/idempotency, error handling, and performance optimization.
- Advanced Python engineering skills: clean architecture, modularity, testability,packaging, secure coding, and maintainability at team scale.
- Strong experience building API-first services (FastAPI or equivalent), RESTFul APIsincluding auth patterns (OAuth2/JWT/API keys), versioning, and backwardscompatibility.
- Integrate and manage relational and vector databases.
- Strong schema/data contract practice using typed models and validation (e.g.,Pydantic-style patterns), including strict structured outputs and schema evolution.
- Working with version control tools like GitHub (branching, PR reviews, releasetagging, CI-friendly workflows).
- Strong experience with context grounding methods, and context engineering whenworking with LLMs (RAG, evidence capture, context selection, prompt/contextstructuring).
- Experience using automation tools and integrating with external applications (API-based integrations, workflow triggers/actions, third-party systems).
- Experience building integration-heavy systems: consuming/producing APIs, handlingenterprise data formats, and creating maintainable connectors.
- Working knowledge of distributed execution patterns: background jobs, scheduling,worker pools, and stateful workflows.
- Ability to work with ambiguity, break down requirements, and deliver reliably withstrong ownership and communication.
Nice to Have
- Experience with agent orchestration frameworks (e.g., LangGraph-like patterns) andLLM observability/evaluation tools (Langfuse-like capabilities).
- Experience integrating enterprise-hosted LLMs (including vertex AI / managedequivalents) and working with provider-agnostic abstraction layers (routing, fallback,cost-aware selection).
- Experience with job queues, distributed tracing, dashboards/alerts, and runbook-driven operational practices.
- Experience supporting regulated enterprise delivery: audit-friendly logging, changecontrols, secure configuration, and controlled deployments.
- Platform/DevOps awareness (preference): Docker basics; Kubernetes/OpenShiftfundamentals; logging/monitoring patterns; secrets management and environmentseparation (DEV/UAT/PROD).
Main Tasks and Responsibilities
1) Build Python Services and Agentic Components
- Develop production-grade backend services and worker processes aligned to thedefined solution architecture.
- Implement orchestration components: job queues, scheduling, statemanagement/state machines, retries, idempotency, and compensating actions.
- Build and maintain tool connectors/integrations with enterprise systems (APIs,databases, files), following safe execution boundaries and permission controls.
- Contribute reusable libraries and shared components to accelerate delivery acrossmultiple client solutions.
2) Implement Reliable LLM and RAG Capabilities
- Integrate LLM capabilities into services using tool/function calling, structuredoutputs, and strict schema validation.
- Develop and maintain RAG pipelines: ingestion, indexing, retrieval, grounding, andevidence capture/citations where required.
- Apply context engineering practices: selecting/structuring context, minimizingirrelevant context, maintaining traceability, and improving response determinism.
- Implement guardrails and safety controls: input validation/sanitization, outputvalidation, refusal/fallback handling, and policy-aligned tool usage.
3) Testing, Quality, and Release Discipline
- Build and maintain test suites: unit, integration, and regression testing (includingprompt/agent regression where applicable).
- Participate in code reviews and follow engineering standards for maintainability,security, and correctness.
- Use GitHub-based workflows effectively: PR hygiene, branching strategies, codeowner reviews, and CI/CD integration.
- Support release processes with strong documentation, configuration discipline, andreadiness checks.
4) Observability, Performance, and Operational Readiness
- Implement logging, tracing, metrics, and decision logs for services and agent runs;support run replay and incident investigation.
- Profile performance bottlenecks and optimize latency, throughput, and cost acrosscritical paths.
- Contribute to dashboards, alerts, runbooks, and operational procedures to maintainstable production systems.
5) Security, Compliance, and Enterprise Delivery
- Implement secure coding practices, secrets handling, and least-privilege patterns intool execution and integrations.
- Follow enterprise governance expectations: audit-friendly logs, change controls,environment separation, and controlled deployments.
- Collaborate closely with the Agentic AI Architect, infra Teams, COE to delivercompliant, production-ready solutions .
#J-18808-Ljbffr…
