Job Responsibilities
- Own the reliability, scalability, latency, and performance of mission‑critical CockroachDB infrastructure.
- Promote operational efficiency by developing automated systems and tools, significantly reducing operational toil.
- Act as a technical mentor and subject‑matter expert within the team and consulting partner to customer teams.
- Serve as a primary technical consultant, partnering with customer teams for project‑based and long‑term CockroachDB support.
- Design, document, and implement the new L1/L2 support channel and operational workflow, including ticket triage, escalation paths, and SLA adherence.
- Act as an escalation point for complex, time‑sensitive production support and troubleshooting within distributed database environments.
- Ensure data security, integrity, and quality by adhering to internal standards and industry best practices (TLS/certificate management, RBAC, secrets management, audit logging).
- Own Backup, Restore, and Disaster Recovery operations, including scheduled full/incremental backups, point‑in‑time restore, restore drills, RPO/RTO definition, and failover testing.
- Manage the full CockroachDB cluster lifecycle, including node add/remove/decommissioning, rolling upgrades, patch validation, capacity planning, and downgrade/rollback planning.
- Develop readable, reusable code to streamline provisioning, management, and monitoring of CockroachDB clusters.
- Collaborate with development teams to define and implement relevant observability metrics, enhancing application reliability.
- Direct incident response, root cause analysis, and post‑mortem processes for critical database outages.
- Create and maintain clear documentation, runbooks, and operational guides to reduce business risk and operational costs.
- Define the technical strategy and roadmap for CockroachDB adoption and management across customer engagements.
- Proactively manage technical debt and identify scaling bottlenecks, ensuring infrastructure remains robust for future growth.
- Achieve billable and effective utilization targets of 85% (subject to modification).
- Perform other duties as assigned by your Manager.
Qualifications
- 5+ years of experience as a Database Administrator (DBA) or Site Reliability Engineer (SRE).
- Advanced experience with CockroachDB and intermediate‑level experience with a secondary relational database (e.g., PostgreSQL or MySQL).
- Proven experience migrating legacy databases (e.g., PostgreSQL, MySQL) to a distributed SQL system like CockroachDB.
- Advanced SQL and optimization knowledge: query tuning, schema/index design, identifying hotspots, managing transaction contention and retry errors, using EXPLAIN ANALYZE/table statistics for diagnostics.
- Tool stack proficiency: Terraform, AWS (high skill in networking for multi‑region deployment), Vault, Gitlab, and deep production experience with Kubernetes/Helm/Operator.
- Deep scripting and automation skills (Python, Go, Bash).
- Expertise in analyzing CockroachDB‑specific observability metrics, including hands‑on experience with Prometheus/Grafana/Alertmanager, Datadog or CloudWatch, DB Console, and SLO/SLI reporting.
- Excellent customer service and consulting focus with proven ability to manage client expectations and priorities.
- Exceptional logical and systematic analysis for problem‑solving and root cause analysis.
- Proactive individual who identifies process gaps and enhances improvements across systems and structures.
- Excellent verbal and written communication skills for delivering technical and strategic information to diverse audiences.
Benefits
- Competitive total rewards package (including paid vacation, sick days, and a day off to volunteer for your favorite charity).
- Flexible remote work: work from home with stable internet; no daily travel required.
- Professional development: substantial training allowance, professional development days, and opportunity to become certified.
- Updated equipment: laptop with chosen OS and annual budget to personalize home workspace.
- Health and wellness budget for gym memberships, massages, fitness, and more.
Hiring Disclaimer
- The successful applicant will need to fulfill requirements necessary to obtain a background check.
- Accommodations are available upon request for candidates taking part in any aspect of the selection process.
#J-18808-Ljbffr…
