You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicating that the service is failing to serve most of its requests and all of its dependent systems with hundreds of thousands of users are affected. As part of your Site Reliability Engineering (SRE) incident management protocol, you declare yourself Incident Commander (IC) and pull in two experienced people from your team as Operations Lead (OL) and
Communications Lead (CL). What should you do next?
Charun
Highly Voted 3 years, 4 months agofrancisco_guerra
Highly Voted 3 years, 4 months agoAzureDP900
2 years agojomonkp
Most Recent 11 months, 1 week agoJonathanSJ
1 year, 9 months agoFeliphus
10 months, 2 weeks agofloppino
1 year, 10 months agomoitsu
1 year, 11 months agoAzureDP900
2 years agoatkhan
2 years agoEricaZhao
2 years, 2 months agoEricaZhao
2 years, 2 months agoGCP72
2 years, 2 months agogomezzang
2 years, 6 months agoric79
2 years, 7 months agozygomar
2 years, 8 months agobuldas
2 years, 8 months agoFeliphus
10 months, 2 weeks agoPhilipKoku
2 years, 8 months agopondai
2 years, 9 months ago