Site Reliability Engineer
About The Position
The Site Reliability Engineering (SRE) team at Aqua Security helps other Engineering teams to improve the reliability, security, resilience, and performance of the services they own.
SREs use their expertise in how Aqua’s Infrastructure works to ensure other teams are getting maximum leverage from it.
SRE also owns several of the processes (and related tooling) that contribute to reliability, including how we run incidents (and how we learn from them), how we make infrastructure changes safely, and how we ensure adequate capacity for future needs.
Our SRE Team is responsible for design, creation, and maintenance of a complete multi cloud deployment and monitoring framework. Working with AWS EKS, GCP GKE and Azure AKS and many other cloud services.
This is an amazing opportunity to work with a world class technical team and to hold responsibility for designing, creating, and provisioning infrastructure and deploying and maintaining applications with a focus on infrastructure as code.
- Strong SRE/DevOps skillset on Azure, GCP or AWS (preferred)
- Experience with Hashicorp products.
- CI/CD - Jenkins
- Monitoring platforms - Datadog preferred
- Containerization with Docker and Kubernetes
- Strong working experience with bash, Python/GO
- Solid Linux experience
- Database expertise a plus (PostgreSQL ideal)
- An eagerness to automate everything
- Manage availability and performance problems for clients
- automate resolution to prevent re-occurrences
- Ability to be on call on a rotational basis