Exam Professional Machine Learning Engineer topic 1 question 336 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 336
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You have developed a custom ML model using Vertex AI and want to deploy it for online serving. You need to optimize the model's serving performance by ensuring that the model can handle high throughput while minimizing latency. You want to use the simplest solution. What should you do?

A. Deploy the model to a Vertex AI endpoint resource to automatically scale the serving backend based on the throughput. Configure the endpoint's autoscaling settings to minimize latency.
B. Implement a containerized serving solution using Cloud Run. Configure the concurrency settings to handle multiple requests simultaneously.
C. Apply simplification techniques such as model pruning and quantization to reduce the model's size and complexity. Retrain the model using Vertex AI to improve its performance, latency, memory, and throughput.
D. Enable request-response logging for the model hosted in Vertex AI. Use Looker Studio to analyze the logs, identify bottlenecks, and optimize the model accordingly.

Show Suggested Answer

Suggested Answer: A 🗳️

by Duke_CT at June 20, 2025, 3:53 a.m.

Comments

Submit Cancel

Duke_CT

1 month, 4 weeks ago

Selected Answer: B

Answer is probably D based on the docs.

upvoted 1 times

...