Overview
Smartedge’s Client is looking for an individual to help with their Data Engineer Kafka and Hadoop Expert (Python) on a contract role in Sheffield, UK (Hybrid).
Job Summary: Design and build Kafka-based streaming applications (Kafka Streams/ksqlDB) in Scala/Python for transformation, enrichment, and routing; implement end‑to‑end streaming pipelines — producers, stream processors, and consumers — with strong data quality, idempotency, and DLQ patterns; model topics, schemas, and contracts (Avro/Protobuf/JSON) and maintain backward and forward compatibility; develop batch/stream interoperability with Spark/Structured Streaming for aggregation, feature generation, and storage in Parquet/ORC; integrate processed data into analytics/observability platforms (e.g., Splunk) for dashboards, alerting, and proactive insights; build automated validation, replay, and backfill mechanisms to ensure reliability and SLA adherence; apply observability to the pipelines themselves (metrics, traces, structured logs) and tune performance and cost; collaborate with platform/infra teams handling Kafka administration (brokers, security, ops) while owning application‑side streaming logic; ensure security and compliance for application data paths (authn/z, encryption in transit/at rest, secret management); and document data flows, schemas, and runbooks for streaming services.
Responsibilities
- Lead Kafka application development using Kafka Streams/ksqlDB, selecting appropriate producer/consumer patterns, partitioning/serialization strategies, and exactly‑on‑ce at‑least‑on‑ce semantics.
- Design, develop, and maintain batch/stream interoperability solutions with Spark, Structured Streaming, Parquet, and ORC for feature generation and data storage.
- Implement observability across pipelines, generating metrics, traces, and structured logs for dashboards, alerting, and proactive insights.
- Develop automated validation, replay, and backfill mechanisms to guarantee reliability and SLA compliance.
- Maintain data quality and reliability by enforcing idempotent processing, DLQs, replay/backfill strategies, lineage, and SLA‑aware designs.
- Ensure compliance with security standards, including AuthN/Z, TLS/SASL, encryption in transit and at rest, and secret management.
- Collaborate closely with Kafka platform and infrastructure teams, while retaining ownership of application‑side streaming logic.
- Document data flows, schemas, and runbooks for all streaming services.
Qualifications
- Proficiency in Scala and/or Python for streaming application development.
- Experience with testing frameworks and CI/CD practices for stream processors.
- Strong knowledge of schema management (Avro, Protobuf, JSON), schema registry usage, and compatibility strategies.
- Expertise in stream and batch processing with Spark (including Structured Streaming), Parquet/ORC, partitioning/bucketing, and performance tuning.
- Solid background in data quality and reliability, including idempotency, DLQ patterns, replay/backfill, lineage tracking, and SLA‑aware designs.
- Hands‑on experience with observability tools—metrics, tracing, and structured logging—for stream applications.
- Familiarity with security and compliance requirements for streaming platforms (AuthN/Z, TLS/SASL, encryption, secret management).
- Strong communication and documentation skills to work effectively with platform and admin teams.
#J-18808-Ljbffr…
