exam questions

Exam Professional Cloud DevOps Engineer All Questions

View all questions & answers for the Professional Cloud DevOps Engineer exam

Exam Professional Cloud DevOps Engineer topic 1 question 148 discussion

Actual exam question from Google's Professional Cloud DevOps Engineer
Question #: 148
Topic #: 1
[All Professional Cloud DevOps Engineer Questions]

You are the Operations Lead for an ongoing incident with one of your services. The service usually runs at around 70% capacity. You notice that one node is returning 5xx errors for all requests. There has also been a noticeable increase in support cases from customers. You need to remove the offending node from the load balancer pool so that you can isolate and investigate the node. You want to follow Google-recommended practices to manage the incident and reduce the impact on users. What should you do?

  • A. 1. Communicate your intent to the incident team.
    2. Perform a load analysis to determine if the remaining nodes can handle the increase in traffic offloaded from the removed node, and scale appropriately.
    3. When any new nodes report healthy, drain traffic from the unhealthy node, and remove the unhealthy node from service.
  • B. 1. Communicate your intent to the incident team.
    2. Add a new node to the pool, and wait for the new node to report as healthy.
    3. When traffic is being served on the new node, drain traffic from the unhealthy node, and remove the old node from service.
  • C. 1. Drain traffic from the unhealthy node and remove the node from service.
    2. Monitor traffic to ensure that the error is resolved and that the other nodes in the pool are handling the traffic appropriately.
    3. Scale the pool as necessary to handle the new load.
    4. Communicate your actions to the incident team.
  • D. 1. Drain traffic from the unhealthy node and remove the old node from service.
    2. Add a new node to the pool, wait for the new node to report as healthy, and then serve traffic to the new node.
    3. Monitor traffic to ensure that the pool is healthy and is handling traffic appropriately.
    4. Communicate your actions to the incident team.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
activist
Highly Voted 1 year ago
Answer A seems to be correct.
upvoted 6 times
...
heftjustice
Most Recent 8 months, 3 weeks ago
C Ref: https://sre.google/sre-book/effective-troubleshooting/
upvoted 1 times
...
xhilmi
10 months, 4 weeks ago
Selected Answer: A
Choosing option A. First, communicating your intent to the incident team ensures transparency and collaboration. Performing a load analysis is crucial to determine if the remaining nodes can handle the increased traffic after offloading from the unhealthy node. Scaling appropriately is essential to maintain the overall capacity. Once new nodes report as healthy, draining traffic from the unhealthy node ensures a gradual transition without disrupting user experience. Removing the unhealthy node from service comes after ensuring that the other nodes can handle the load effectively. This step-by-step approach, coupled with communication and load analysis, aligns with Google-recommended practices for incident response and minimizes the impact on users during the investigation and resolution process.
upvoted 1 times
...
nqthien041292
11 months ago
Selected Answer: A
Vote A
upvoted 2 times
...
mshafa
11 months, 3 weeks ago
Selected Answer: D
Option A and option B do not add a new node to the pool to handle the increased load, which may leave the remaining nodes overburdened and unable to handle the traffic adequately. Option C starts with draining traffic from the unhealthy node, which is a good step, but it doesn't immediately add a new node to the pool to handle the load. It also lacks the step of explicitly communicating the actions to the incident team.
upvoted 2 times
pharao89
11 months, 3 weeks ago
The second point in answer A is about scaling. A is correct. You can easily eliminate C and D because information to the incident team should be the first thing to do. "2. Perform a load analysis to determine if the remaining nodes can handle the increase in traffic offloaded from the removed node, and scale appropriately."
upvoted 3 times
...
...
lelele2023
1 year ago
Selected Answer: A
The service usually run 70% of the capacity, hence even one node is out of order you'd always want to see if the rest of the computing resource are enough to support the stress before arbitrarily adding any new nodes.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago