Site Reliability Engineer | Remote

Company: Crossing Hurdles
Apply for the Site Reliability Engineer | Remote
Location:
Job Description:

  • Design, implement, and maintain scalable infrastructure using Linux and Kubernetes.
  • Monitor system performance using Prometheus and address potential issues proactively.
  • Automate operational processes to improve system reliability and efficiency.
  • Respond to incidents, perform root cause analysis, and implement improvements.
  • Collaborate with development teams to ensure smooth deployments and high availability.
  • Create and maintain documentation, runbooks, and operational guidelines.
  • Promote best practices in reliability, security, and system performance.

Requirements

  • Strong experience with Linux system administration and troubleshooting.
  • Strong expertise in Kubernetes cluster management and orchestration.
  • Strong experience using Prometheus for monitoring and alerting.
  • Proficiency in scripting languages such as Bash or Python.
  • Strong problem-solving and incident management skills.
  • Excellent written and verbal communication skills.
  • Ability to work independently in a remote, fast-paced environment.

#J-18808-Ljbffr…

Posted: May 9th, 2026