Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 164 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 164
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You work for a small company that has deployed an ML model with autoscaling on Vertex AI to serve online predictions in a production environment. The current model receives about 20 prediction requests per hour with an average response time of one second. You have retrained the same model on a new batch of data, and now you are canary testing it, sending ~10% of production traffic to the new model. During this canary test, you notice that prediction requests for your new model are taking between 30 and 180 seconds to complete. What should you do?

A. Submit a request to raise your project quota to ensure that multiple prediction services can run concurrently.
B. Turn off auto-scaling for the online prediction service of your new model. Use manual scaling with one node always available.
C. Remove your new model from the production environment. Compare the new model and existing model codes to identify the cause of the performance bottleneck.
D. Remove your new model from the production environment. For a short trial period, send all incoming prediction requests to BigQuery. Request batch predictions from your new model, and then use the Data Labeling Service to validate your model’s performance before promoting it to production.

Show Suggested Answer

Suggested Answer: C 🗳️

by kalle_balle at Jan. 7, 2024, 3:56 a.m.

Comments

Submit Cancel

sonicclasps

Highly Voted 1 year, 5 months ago

Selected Answer: B

sounds to me that the new model has too few requests per hour and therefore scales downs to 0. Which means it has to create the an instance every time it serves a request, and this takes time. By manually setting the number of nodes, the nodes will always be running, whether or not they are serving predictions

upvoted 5 times

...

kirukkuman

Most Recent 1 week, 4 days ago

Selected Answer: C

The new model version is failing badly, with response times up to 180 seconds being unacceptable for an online service. The absolute first priority is to stop impacting users. Removing the new model from the canary test and routing all traffic back to the stable, existing version immediately mitigates the problem. Once the production environment is stable, you can begin your root cause analysis. The problem is a performance bottleneck, not an issue with scaling or infrastructure. Since the only thing that changed was the retrained model, the cause must lie within the new model artifact or its prediction code. Comparing the new version with the old is the most logical way to find what changed to cause the drastic slowdown.

upvoted 1 times

...

Begum

1 month, 3 weeks ago

Selected Answer: C

Need to check the performance bottlenecks

upvoted 1 times

...

desertlotus1211

4 months, 1 week ago

Selected Answer: C

You're performing 20 predictions an hour - so scaling isn’t the root issue. Code issue.

upvoted 2 times

...

vini123

5 months ago

Selected Answer: B

Since the same model is being used and the only change is the data, it's likely that the latency issue is caused by how Vertex AI is scaling the prediction service.

upvoted 1 times

...

potomeek

6 months ago

Selected Answer: C

Removing the new model from production to debug and address the root cause of the latency issue is the most efficient and logical course of action. This ensures minimal disruption to production services and lays the groundwork for a smooth rollout after fixing the bottleneck

upvoted 1 times

...

YushiSato

11 months ago

I don't see B as the right answer. The Vertex AI Endpoint cannot scale to 0 for newer version of the model. > When you configure a DeployedModel, you must set dedicatedResources.minReplicaCount to at least 1. In other words, you cannot configure the DeployedModel to scale to 0 prediction nodes when it is unused. https://cloud.google.com/vertex-ai/docs/general/deployment#scaling

upvoted 3 times

YushiSato

11 months ago

I was convinced that the machines that are autoscaled by the Vertex AI Endpoint seem to be tied to the endpoint, not the model in which they are deployed.

upvoted 1 times

...

AnnaR

1 year, 2 months ago

Selected Answer: B

B can be effective in controlling resources available to the new model, ensuring that it is not delayed by the autoscaling trying to scale up from 0. Not A: there is no indication in the description that quota limits cause the slowdown and does not address issue where new model is performing poorly on canary testing. Not C : when you pull the new model from prod environment, you could affect end-user experience Not D: Same as C plus you rely on batch predictions which does not align with the need for online, real-time predictions in the prod environemnt. Data Labeling Service is more about assessing accuracy and less about resolving latency issues.

upvoted 2 times

...

pinimichele01

1 year, 3 months ago

Selected Answer: B

You have retrained the same model on a new batch of data

upvoted 1 times

pinimichele01

1 year, 2 months ago

the new model has too few requests per hour and therefore scales downs to 0. Which means it has to create the an instance every time it serves a request, and this takes time. By manually setting the number of nodes, the nodes will always be running, whether or not they are serving predictions

upvoted 4 times

...

VipinSingla

1 year, 3 months ago

Selected Answer: B

bottleneck seems to be start of node as there are very low number of requests so having one node always available will help in this case.

upvoted 1 times

...

Aastha_Vashist

1 year, 3 months ago

Selected Answer: C

went with c

upvoted 1 times

rajshiv

7 months, 1 week ago

I also think C. The model performance issue needs to be addressed.

upvoted 1 times

...

Carlose2108

1 year, 4 months ago

Selected Answer: C

I went C. Diagnosing the root cause.

upvoted 1 times

...

guilhermebutzke

1 year, 5 months ago

Selected Answer: C

Choose C. The significant increase in response time from 1 second to between 30 and 180 seconds indicates a performance issue with the new model. Before making any further changes or decisions, it's crucial to identify the root cause of this performance bottleneck. By comparing the code of the new model with the existing model, you can pinpoint any differences that might be causing the slowdown. In A, This may not be the root cause and could incur unnecessary costs without addressing the performance issue. In B, it doesn't address the underlying issue causing the significant increase in response time observed during canary testing. in D, This would significantly increase latency and hinder real-time predictions, negatively impacting user experience.

upvoted 2 times

vaibavi

1 year, 5 months ago

But in the question it says "You have retrained the same model on a new batch of data" it's just the data that changed so no need to check for the code check.

upvoted 2 times

lunalongo

7 months ago

B is still right because - Retraining often involves adjustments to hyperparameters or training processes. - Changes to data preprocessing steps (e.g., feature scaling, handling missing values) during retraining can change model code and affect model performance. - The retraining process itself might have introduced unknown bugs or inefficiencies into the model's deployment pipeline or the code that interacts with the model.

upvoted 1 times

...

b1a8fae

1 year, 6 months ago

Unsure on this one, but I would go with A. B. Turning off auto-scaling is a good measure when dealing with datasets with steep spikes of requests traffic (here we are dealing with avg. 20 request per hour) "The service may not be able to bring nodes online fast enough to keep up with large spikes of request traffic." https://cloud.google.com/blog/products/ai-machine-learning/scaling-machine-learning-predictions C. You retrain the SAME model on a different batch of data. It is implied that the code is the same too? D. Actual quality of the model is not in question here, but rather the long prediction time per request. Even if the requests traffic is very low, I can only consider option A: the selected quota cannot deal with the amount of concurrent prediction requests.

upvoted 1 times

...

kalle_balle

1 year, 6 months ago

Selected Answer: C

Option B or D is completely wrong. Option A to raise the quota might be necessary in some situations but doesn't necessarily deal with the performance issue at the test. Option C seems like the most suitable option.

upvoted 1 times

edoo

1 year, 4 months ago

You only retrained the same model, your code hasn't changed, you won't find anything with C. It's B.

upvoted 1 times

...