Role Overview
We are seeking experienced Kafka Administrators / DevOps Engineers to join the Streaming CoE team responsible for delivering, operating, and optimizing enterprise-scale Kafka-based streaming platforms. The successful candidates will play a key role in building and supporting highly available, secure, and scalable Streaming-as-a-Service solutions that enable real-time data ingestion and event-driven architectures across critical business workloads.
This role requires strong expertise in Apache Kafka, Kubernetes, DevOps automation, Cloud Platforms, and Observability tooling, along with hands-on experience managing large-scale streaming environments.
Key Responsibilities
Kafka Platform Administration
- Design, deploy, configure, and manage enterprise Kafka clusters.
- Administer Kafka components including:
- Kafka Brokers
- Kafka Connect
- Schema Registry
- Kafka Streams
- ZooKeeper / KRaft
- Manage topics, partitions, replication, retention policies, and consumer groups.
- Perform Kafka cluster upgrades, migrations, scaling, and failover activities.
- Monitor cluster health and optimize performance, throughput, and reliability.
- Troubleshoot Kafka-related issues including:
- Consumer lag
- ISR issues
- Leader election problems
- Replication delays
- Connectivity and performance bottlenecks
- Implement and maintain Kafka security controls including SSL/TLS, ACLs, authentication, authorization, and encryption.
DevOps & Automation
- Build and maintain CI/CD pipelines using:
- Jenkins
- GitHub Actions
- Azure DevOps
- Automate platform deployments and configuration management.
- Develop Infrastructure-as-Code solutions using Terraform and Ansible.
- Support GitOps practices and automated deployment workflows.
Kubernetes & Platform Engineering
- Deploy and manage Kafka workloads on Kubernetes platforms.
- Create and maintain Helm charts.
- Support ArgoCD-based GitOps deployment models.
- Manage ConfigMaps, Secrets, certificates, and application configurations.
- Implement rolling upgrades, scaling strategies, and lifecycle management.
- Ensure platform stability, resilience, and operational excellence.
Cloud Infrastructure
- Support Kafka deployments on one or more cloud platforms:
- AWS
- Microsoft Azure
- Google Cloud Platform (GCP)
- Work closely with cloud engineering teams to optimize infrastructure and platform performance.
- Support cloud-native deployment and operational practices.
Monitoring & Reliability Engineering
- Implement and maintain observability solutions using:
- Prometheus
- Grafana
- Dynatrace
- Confluent Control Center
- Create monitoring dashboards, alerts, and operational reports.
- Drive SRE best practices and platform reliability improvements.
- Participate in incident management, root cause analysis, and continuous improvement initiatives.
Required Skills & Experience
Kafka Administration (Must Have)
- Strong hands-on experience with Apache Kafka.
- Deep understanding of:
- Brokers
- Topics
- Partitions
- Replication
- Consumer Groups
- Kafka Connect
- Schema Registry
- Kafka Streams
- ZooKeeper / KRaft
- Experience with Kafka performance tuning and capacity planning.
- Experience managing enterprise-scale Kafka clusters.
- Strong troubleshooting and operational support experience.
- Knowledge of Kafka security best practices.
…
