About the Role
We’re seeking an Senior DataOps Engineer II who can act as the hands‑on owner for Monolith’s data observability and operational surface: from batch and streaming pipelines running on our platform, through to the lineage, quality, and runbooks that keep customer environments healthy.
In This Role, You Will
- Own Monolith’s Data Observability & Operations Surface
- Design and implement the end‑to‑end observability stack for data workloads (metrics, logs, traces, and data‑quality signals) across batch and streaming pipelines.
- Define and maintain operational SLOs/SLAs for critical data flows powering training, inference, and analytics, and ensure they are measurable and actionable.
- Build dashboards, alerts, and runbooks that allow engineers and on‑call responders to quickly detect, triage, and remediate data incidents.
- Standardise “golden paths” for how teams instrument pipelines, expose health signals, and respond to data‑related failures.
- Implement Data Lineage, Quality & Governance
- Deploy and maintain end‑to‑end data lineage for key domains — from client sources through transformations to features, models, and downstream analytics so teams can debug, audit, and reason about change.
- Define and roll out data quality checks (schema, freshness, completeness, distribution, drift) and ensure failures integrate cleanly into alerting and incident workflows.
- Partner with Security, Compliance, and customer‑facing teams to encode data governance requirements (e.g., retention, residency, access controls) into our pipelines and tooling.
- Help shape metadata models and catalog conventions so that producers and consumers can reliably discover, understand, and use shared datasets.
- Enable DataOps Practices Across Teams
- Establish CI/CD patterns for data pipelines and related infrastructure, including testing strategies, promotion workflows, and change‑management guardrails.
- Drive adoption of infra‑as‑code for data infrastructure (e.g., pipeline orchestration, storage, observability components), reducing manual drift across environments.
- Define and continuously improve DataOps processes — incident response, post‑incident review, change review, on‑call rotations — with a focus on learning rather than blame.
- Evaluate and integrate best‑of‑breed DataOps and observability tooling where it accelerates our teams, balancing build vs. buy pragmatically.
- Partner Across Monolith, CoreWeave & Clients
- Work with Monolith platform, data, agent, and reliability teams to expose observability and lineage as shared services and patterns other engineers can build on.
- Collaborate with CoreWeave infrastructure and AI platform teams to leverage underlying storage, compute, networking, and observability in service of robust data flows.
- Serve as a technical escalation point for forward‑deployed and customer‑facing engineers when data issues cross service boundaries or require deeper architectural insight.
- Mentor data producers (product teams, integrations, forward‑deployed engineers) and data consumers (data scientists, analysts, client engineers) on resilient schemas, contracts, and operational practices.
Who You Are
- Experience & Level
- Typically 5–6+ years of experience in DataOps, Data Engineering, DevOps/SRE for data platforms, or similar roles, including end‑to‑end ownership of production data pipelines and their operations.
- Proven track record of operating at Senior IC scope: leading cross‑team initiatives, introducing new practices/tooling, and improving reliability at the platform level.
- DataOps, Pipelines & Tooling
- Strong hands‑on experience designing, deploying, and operating data pipelines in production (batch and/or streaming), including failure modes, retries, and backfills.
- Practical experience with data orchestration and ETL/ELT tooling (e.g., Airflow, Dagster, dbt, Temporal, or similar) and comfort evaluating and integrating new tools where appropriate.
- Solid SQL and/or Spark skills and experience with at least one major analytical database or warehouse; familiarity with time‑series / telemetry data is a plus.
- Observability, Lineage & Data Quality
- Extensive experience implementing data observability — metrics, logging, tracing, dashboards, and alerting — for data‑centric workloads.
- Hands‑on work with data quality frameworks and/or observability platforms to monitor freshness, completeness, schema changes, and anomalies.
- Experience deploying and using data lineage or metadata/catalog solutions, and applying them to debugging, compliance, and change‑impact analysis.
- Platform, Infrastructure & Automation
- Comfortable working in containerised, cloud‑native environments (Kubernetes plus at least one major cloud provider); experience with GPU‑ or compute‑intensive workloads is a bonus.
- Strong automation mindset: infra‑as‑code, CI/CD, and configuration management for data infrastructure and observability components.
- Proficient in Python for building tooling, pipeline glue, and platform integrations; additional languages are a plus.
- Collaboration, Mentorship & Communication
- Clear communicator who can explain complex data flows and failure modes to both deeply technical and non‑specialist audiences.
- Experience mentoring engineers and data practitioners on better data management, observability, and operational hygiene — through documentation, examples, reviews, and office hours.
- Comfortable working in a fast‑moving, high‑ambiguity environment where we balance rapid iteration with the safety and reliability demanded by enterprise engineering clients.
Preferred
- Experience in ML/AI platforms or MLOps environments where data pipelines power experimentation, training, and inference at scale.
- Background with test, simulation, or time‑series data (e.g., physical test benches, battery labs, automotive/aerospace R&D).
- Familiarity with feature stores, experiment tracking, or model registries and their interaction with upstream data pipelines.
- Prior work in multi‑tenant SaaS platforms, especially those with strong compliance, observability, and uptime requirements.
- Experience supporting or partnering closely with forward‑deployed / professional services teams in complex customer environments.
Benefits
- Family-level Medical Insurance
- Family-level Dental Insurance
- Generous Pension Contribution
- Life Assurance at 4x Salary
- Critical Illness Cover
- Employee Assistance Programme
- Tuition Reimbursement
- Work culture focused on innovative disruption
Equal Employment Opportunity
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
#J-18808-Ljbffr