You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicating that the service is failing to serve most of its requests and all of its dependent systems with hundreds of thousands of users are affected. As part of your Site Reliability Engineering (SRE) incident management protocol, you declare yourself Incident Commander (IC) and pull in two experienced people from your team as Operations Lead (OL) and
Communications Lead (CL). What should you do next?
Charun
Highly Voted 3 years, 5 months agofrancisco_guerra
Highly Voted 3 years, 5 months agoAzureDP900
2 years, 1 month agojomonkp
Most Recent 1 year agoJonathanSJ
1 year, 11 months agoFeliphus
1 year agofloppino
1 year, 12 months agomoitsu
2 years, 1 month agoAzureDP900
2 years, 1 month agoatkhan
2 years, 2 months agoEricaZhao
2 years, 3 months agoEricaZhao
2 years, 3 months agoGCP72
2 years, 4 months agogomezzang
2 years, 8 months agoric79
2 years, 9 months agozygomar
2 years, 10 months agobuldas
2 years, 10 months agoFeliphus
1 year agoPhilipKoku
2 years, 10 months agopondai
2 years, 11 months ago