Site Reliability Engineer (SRE)
About The Position
The Site Reliability Engineering (SRE) team at Aqua Security helps other Engineering teams improve the reliability, security, resilience, and performance of the services they own.
SREs use their expertise in how Aqua’s Infrastructure works to ensure other teams are getting maximum leverage from it. SRE also owns several of the processes (and related tooling) that contribute to reliability, including how we run incidents (and how we learn from them), how we make infrastructure changes safely, and how we ensure adequate capacity for future needs.
This is an opportunity to define, build, evangelize, and optimize our SRE practices!
About The Job
- Build and lead a diverse team of world-class SREs
- Define standard practices and build tooling around incidents, postmortems, changes, and capacity and work with other Engineering teams to help them adopt these practices to improve their services
- Work with other teams around Engineering to understand their systems and their challenges and identify how they can better leverage Instacart Infrastructure
- Build, prioritize, communicate, and drive the roadmap for your team
- Provide technical and architectural leadership for your team and others
- Grow and develop the individuals on your team
- Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering or equivalent work experience
- 5+ years of experience managing a team of engineers
- You have led an Infrastructure or SRE team in a production operations context
- You have experience solving infrastructure problems with software
- You can recruit, hire, and build a team
- You have a big-picture perspective on systems and tools
- You can collaborate with other Engineering teams to understand their systems and help to improve them
- You have strong technical knowledge of cloud infrastructure, distributed systems, and reliability practices