CoreWeave is a cloud platform that empowers AI innovation. Founded in 2017, it provides infrastructure, tools, and expertise to improve performance.
What You’ll Do
- Develop, optimize, and maintain network observability platforms. Use Python and Golang to create collectors, exporters, and dashboards that provide deep visibility into network health and performance.
- Collaborate with Network Engineering and Platform teams to ingest and unify logs, metrics, and events from various platforms (Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, SR Linux, etc.) into a single observability pipeline.
- Design and implement scalable telemetry solutions using protocols like gNMI, SNMP, and streaming analytics. Ensure advanced alerting and anomaly detection with Prometheus, Grafana, Alertmanager.
- Work closely with network developers, site reliability engineers, and security teams to integrate observability solutions across the broader infrastructure. Participate in design discussions, RFCs, and architectural decisions.
- Join a rotating on‑call schedule to troubleshoot and resolve observability‑related issues. Provide timely support to operations teams, quickly isolating and fixing problems when they arise.
- Guide junior team members, share best practices, and foster a culture of continuous learning and improvement within the observability domain.
Minimum Qualifications
- Deep familiarity with Prometheus, Grafana, Alertmanager, gNMI, SNMP. Experience writing or extending custom metric collectors/exporters.
- Experience as a Network Engineer, SRE, Software Developer, or Systems Administrator in large‑scale environments. Track record of building and operating robust telemetry and monitoring solutions.
- Passion for automating tasks and processes.
- Comfortable containerizing solutions in Kubernetes and deploying container‑based workloads efficiently.
- Proficient with Python, Go, Bash, and familiar with configuration management tools (Ansible, Jinja2).
- Strong knowledge of Linux systems and IP networking concepts, including routing, switching, and network troubleshooting.
- Practical knowledge with platforms such as Arista EOS, NVIDIA Cumulus Linux, Nokia SR OS, and SR Linux.
- Collaborative, humble, and open to learning from senior colleagues.
Preferred Qualifications
- Bachelor’s degree in Computer Science or related field.
- Experience applying machine learning for anomaly detection (TensorFlow, scikit‑learn).
- Network certifications (CCNA, CCNP, etc.).
- Experience with data pipelines, event correlation, or anomaly detection in large‑scale environments.
- Familiarity with OpenTelemetry, Jaeger, or Zipkin for distributed tracing.
Benefits
- Family‑level medical insurance.
- Family‑level dental insurance.
- Generous pension contribution.
- Life assurance at 4× salary.
- Critical illness cover.
- Employee assistance programme.
- Tuition reimbursement.
- Work culture focused on innovative disruption.
All candidates must undergo a basic criminal record check in compliance with GDPR. Employment offers are conditional upon satisfactory results.
Equal Opportunity
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.
Export Control Compliance
This position requires access to export‑controlled information. Applicants must be U.S. persons or eligible to access information without required export authorization. CoreWeave may decline to pursue export licensing for legitimate business reasons.
#J-18808-Ljbffr…
