Infostretch is a pure-play digital engineering services firm focused on helping companies accelerate their digital initiatives from strategy and planning through execution. We leverage deep technical expertise, Agile methodologies and data-driven intelligence to modernize systems of engagement and simplify human/tech interaction. We deliver custom solutions that meet customers’ technology needs wherever they are in their digital lifecycle. Backed by Goldman Sachs and Everstone Capital, Infostretch works with both large enterprises and emerging innovators -- putting digital to work to enable new products and business models, engage with customers in new ways, and create sustainable competitive differentiation.
Job Description: Associate Project Manager – SRE
As an Associate Project Manager, Site Reliability Engineering you will lead a team responsible for various aspects of cloud operations including design, implementation, automation of large scale distributed systems as well as managing production support requests. You will provide technical leadership and prepare detailed plans to implement and ensure SLAs are being met.
- Lead a team of engineers across production engineering responsibilities that include Site Reliability Engineering (SRE), Engineering Productivity and Quality, working closely with engineering counterparts across geographies i.e. USA and UK.
- Incident management for production systems - on-call and alerting - for efficient recovery from production incidents. Engage with other Engineering leaders to implement processes, identify improvements, and drive consistent results
- Working with your SRE and Engineering US counterparts for ongoing training and response readiness efforts
- Define the support roster and Manage on-call rotations
- Working with service teams, build and automate tooling and best practices to observe and manage production service desk and consistently achieve defined SLA
- Track and present KPI to upper management
- Represent our company culture of transparency, trust, collaboration, and empowerment of the team and individual
- Perform a light amount of HR responsibilities as the first-line manager of individuals
- Strong emphasis on SRE as an engineering discipline with a focus on automation
- Superb interpersonal skills, capable of working with multi-functional technical and business teams and varying levels of management
- Proven project management skills, including excellent presentation skills
- Should be capable of writing detailed solution specifications, diagrams, best practices/standards documentation, operating procedures, test plans/test reports, etc.
- Experience supporting infrastructure and services in public cloud environments (Azure, AWS or GCP etc.)
- Excellent problem solving, analytical, and decision-making skills
- Experience managing complex projects
- Experience with DevOps/SRE methodologies & tools
- Experience with public cloud cost management
- Experience in performance engineering and capacity planning
- Experience with software development and testing process in an agile environment
- Ability to work in a collaborative environment
- MCA or MS in computer Science or equivalent experience