What you will do
- Monitor, troubleshoot, and support infrastructure and services deployed in a data center, ensuring smooth and uninterrupted operations.
- Provide Level 1 (L1) support for incidents related to Kubernetes infrastructure and DevOps workflows, specifically within Kubernetes-based environments.
- Conduct first-level debugging and resolution of incidents involving infrastructure components, escalating to Level 2 (L2) or Level 3 (L3) when necessary.
- Collaborate closely with two junior engineers, providing mentorship, technical guidance, and code reviews to ensure proper handling of infrastructure issues.
- Maintain and monitor production environments with the ability to detect and resolve common infrastructure challenges.
- Work during EST hours, supporting infrastructure, applications, and system health across multi-tenant and data center environments.
- Participate in all phases of the Service Delivery Lifecycle (SDLC) from monitoring, issue detection, troubleshooting, and continuous improvement.
- Drive improvement in operational efficiency by identifying bottlenecks and optimizing incident response processes.
Who you are
- 5+ years of experience in infrastructure monitoring and Level 1 support roles, with expertise in handling Kubernetes infrastructure and cloud deployments.
- Strong hands-on experience with Kubernetes, containers, and DevOps tooling (Docker, Helm, Terraform) in both cloud and on-premises environments.
- Proven ability to manage infrastructure in data center environments, with knowledge of incident management best practices.
- Proficient in basic debugging and troubleshooting of infrastructure issues, with the ability to identify root causes and take corrective actions.
- Experience with monitoring tools (e.g., Datadog, Prometheus, ELK) and familiarity with alerting mechanisms.
- Strong understanding of networking, server health, and application performance monitoring within large-scale distributed systems.
- Solid communication skills, with a demonstrated ability to collaborate and lead a team of engineers in a fast-paced environment.
- Ability to work independently in a remote/hybrid team structure, with availability during EST working hours.
Sorry! This job has expired.