exam questions

Exam Professional Cloud DevOps Engineer All Questions

View all questions & answers for the Professional Cloud DevOps Engineer exam

Exam Professional Cloud DevOps Engineer topic 1 question 59 discussion

Actual exam question from Google's Professional Cloud DevOps Engineer
Question #: 59
Topic #: 1
[All Professional Cloud DevOps Engineer Questions]

You encounter a large number of outages in the production systems you support. You receive alerts for all the outages that wake you up at night. The alerts are due to unhealthy systems that are automatically restarted within a minute. You want to set up a process that would prevent staff burnout while following Site
Reliability Engineering practices. What should you do?

  • A. Eliminate unactionable alerts.
  • B. Create an incident report for each of the alerts.
  • C. Distribute the alerts to engineers in different time zones.
  • D. Redefine the related Service Level Objective so that the error budget is not exhausted.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
AL12
Highly Voted 3 years ago
I reckon its A, the reason is because it seems like the problem is automatically fixed with an restart of the service after a minute, therefore engineers don't really need to be woken up about these problems. If it failed multiple times or if the restart failed, then the engineer should be woken up
upvoted 14 times
MF2C
3 years ago
A or C
upvoted 1 times
...
...
09bd94b
Most Recent 2 months, 2 weeks ago
Selected Answer: A
Agree with A. It does not make sense to wake up an engineer when you know that there is no need for any remedy action
upvoted 1 times
...
JonathanSJ
1 year, 9 months ago
Selected Answer: A
I agree with A.
upvoted 2 times
...
Greg123123
1 year, 10 months ago
Selected Answer: A
It should be A rather than D. To follow SRE practice, we should eliminate unactionable alert which is pointless and to increase precision. While D also looks valid, the question never say that the application is being affected (e.g. has downtime), and never says any actions are needed. As a result, there is no need to redefine SLI and since they didn't spend time to resolve it no error budget is spent.
upvoted 2 times
...
ssmb
2 years ago
Between A and C, B and D answers are not good. I lean more towards A because those alerts seem unactionable a the moment alert is received, ie: machine restarted automatically already. This would be best imidiate action as per the question. Of course the source of alerts should be looked at and fixed separately from addressing the issue in question.
upvoted 2 times
...
[Removed]
2 years, 4 months ago
I agree with A. Eliminate bad monitoring : Unactionable alerts (i.e., spam) https://cloud.google.com/blog/products/management-tools/meeting-reliability-challenges-with-sre-principles
upvoted 1 times
...
zygomar
2 years, 8 months ago
Selected Answer: A
agree with kyubiblaze about having to remove unactionable items aka spam: "good monitoring alerts on actionable problems" @ https://cloud.google.com/blog/products/management-tools/meeting-reliability-challenges-with-sre-principles
upvoted 4 times
...
Sekierer
2 years, 9 months ago
A is correct
upvoted 1 times
...
KyubiBlaze
2 years, 9 months ago
A - You have to remove "unactionable" alerts, these alerts are useless if you can't take any action. Simple reason, C might be following SRE practice, but it is distributing the problem, not solving it. B and D, totally No.
upvoted 3 times
...
gcpz
2 years, 10 months ago
answer is c. it follows google SRE and prevents staff burnout. https://sre.google/workbook/team-lifecycles/
upvoted 1 times
...
ESP_SAP
2 years, 11 months ago
The team may continue to work on non-reliability features if: The outage was caused by a company-wide networking problem. The outage was caused by a service maintained by another team, who have themselves frozen releases to address their reliability issues. The error budget was consumed by users out of scope for the SLO (e.g., load tests or penetration testers). Miscategorized errors consume budget even though no users were impacted. https://sre.google/workbook/error-budget-policy/
upvoted 3 times
ESP_SAP
2 years, 11 months ago
Correct Answer is (D):
upvoted 2 times
...
...
Manh
2 years, 12 months ago
Answer D
upvoted 1 times
...
NXD
3 years ago
C follows the SRE.
upvoted 3 times
Feliphus
10 months, 2 weeks ago
The statemene says: you encounter a large number of outages in the production systems you support, then eliminating the alerts doesn't seem to be a good idea. If there is another support team in another time zone. What's happen if the server doesn't reboot or the services don't start fine?. There is not a correct answer between options, what it would be to resolve the reboot problem. I don't know which is better if A or C, I suppose we have losed some information in the statement or in the answers. But in this situation I agree @NXD and choose C
upvoted 1 times
Feliphus
10 months, 2 weeks ago
Sorry, but I change to ans A. I have noticed this question is repeated as Q133 but without the text: You receive alerts for all the outages that wake you up at night
upvoted 1 times
...
...
...
TNT87
3 years ago
Ans D https://www.atlassian.com/incident-management/kpis/error-budget
upvoted 4 times
TNT87
2 years, 10 months ago
Ans A...point of correction
upvoted 1 times
TNT87
2 years, 10 months ago
NO!D is correct
upvoted 1 times
...
...
...
neutrino9
3 years ago
Should be A
upvoted 3 times
...
job_search83
3 years ago
D redefine SLI
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago