We are Hiring for an experienced Data Engineer to join our team in Sheffield, United Kingdom. The ideal candidate will have Strong Experience in Scala and/or Python for streaming apps, familiarity with testing frameworks and CI for stream processors. • Collaboration: Work closely with Kafka platform/admin teams while focusing on application-layer streaming logic; strong communication and documentation.
Your responsibilities:
- Design and build Kafka-based streaming applications (Kafka Streams/ksqlDB) in Scala/Python for transformation, enrichment, and routing.
- Implement end-to-end streaming pipelines: producers, stream processors, and consumers with strong data quality, idempotency, and DLQ patterns.
- Model topics, schemas, and contracts (Avro/Protobuf/JSON) and maintain backward/forward compatibility.
- Develop batch/stream interoperability: Spark/Structured Streaming jobs for aggregation, feature generation, and storage in Parquet/ORC.
- Integrate processed data into analytics/observability platforms (e.g., Splunk) for dashboards, alerting, and proactive insights.
- Build automated validation, replay, and backfill mechanisms to ensure reliability and SLA adherence.
- Apply observability to the pipelines themselves (metrics, traces, structured logs) and tune performance/cost.
- Collaborate with platform/infra teams who handle Kafka admin (brokers, security, ops) while owning application-side streaming logic.
- Ensure security and compliance for application data paths (authn/z, encryption in transit/at rest, secret management).
- Document data flows, schemas, and runbooks for streaming services.
Your Profile
Essential skills/knowledge/experience:
- Kafka application development: Kafka Streams/ksqlDB, producer/consumer patterns, partitioning/serialization, exactly-once/at-least-once semantics.
- Languages: Strong in Scala and/or Python for streaming apps; familiarity with testing frameworks and CI for stream processors.
- Schema management: Avro/Protobuf/JSON, schema registry usage, compatibility strategies.
- Stream/batch processing: Spark (including Structured Streaming), Parquet/ORC, partitioning/bucketing, performance tuning.
- Data quality and reliability: Idempotent processing, DLQs, replay/backfill, lineage, and SLA-aware designs.
- Observability: Metrics/tracing/logging for stream apps; integration with downstream dashboards/alerts.
- Security/compliance: AuthN/Z in clients, TLS/SASL usage, secret management in code/services.
- Collaboration: Work closely with Kafka platform/admin teams while focusing on application-layer streaming logic; strong communication and documentation.
#J-18808-Ljbffr…
