Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 96 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 96
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?

A. Increase the instance memory to 512 GB and increase the batch size.
B. Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.
C. Enable early stopping in your Vertex AI Training job.
D. Use the tf.distribute.Strategy API and run a distributed training job.

Show Suggested Answer

Suggested Answer: B 🗳️

by hiromi at Dec. 20, 2022, 12:08 a.m.

Comments

Submit Cancel

smarques

Highly Voted 2 years, 7 months ago

Selected Answer: C

I would say C. The question asks about time, so the option "early stopping" looks fine because it will no impact the existent accuracy (it will maybe improve it). The tf.distribute.Strategy reading the TF docs says that it's used when you want to split training between GPUs, but the question says that we have a single GPU. Open to discuss. :)

upvoted 7 times

djo06

2 years, 1 month ago

tf.distribute.OneDeviceStrategy uses parallel training on one GPU

upvoted 2 times

...

enghabeth

Highly Voted 2 years, 6 months ago

Selected Answer: B

We don't have money problems, and we need something that doesn't impair the performance of the model. So I think it's good to change GPU for TPU

upvoted 5 times

tavva_prudhvi

2 years, 4 months ago

replacing the NVIDIA P100 GPU with a v3-32 TPU, could potentially speed up the training process, but it may require modifying the custom training application to be compatible with TPUs.

upvoted 1 times

...

Begum

Most Recent 3 months, 1 week ago

Selected Answer: C

tf.distribute.Strategy currently does not support TensorFlow's partitioned variables (where a single variable is split across multiple devices) at this time. Hence leaving the option to move to TPU to accelerate the tranining.

upvoted 1 times

...

phani49

8 months ago

Selected Answer: D

D. Use the tf.distribute.Strategy API and run a distributed training job: • Why it’s correct: • Distributed training splits the dataset and workload across multiple machines and GPUs/TPUs, dramatically reducing training time. • The tf.distribute.Strategy API supports both synchronous and asynchronous distributed training, allowing scaling across multiple GPUs or TPUs in Vertex AI. • It is specifically designed for handling large datasets and computationally intensive tasks. • Example Strategies: • MultiWorkerMirroredStrategy: For synchronous training on multiple machines with GPUs. • TPUStrategy: For training across multiple TPUs. • Scales horizontally, effectively handling massive datasets like the 3-million-image X-ray dataset.

upvoted 4 times

...

AB_C

8 months, 3 weeks ago

Selected Answer: D

D is thr right answer

upvoted 2 times

...

Th3N1c3Guy

11 months ago

Selected Answer: B

since compute engine is being used, seems like GPU upgrade makes sense

upvoted 2 times

...

baimus

11 months, 2 weeks ago

Selected Answer: D

The difficulty of this question is it's pure ambiguity. Two of the answer DO change the hardware, so this is obviously an option. The distribute strategy is clearly the right choice (D) assuming we are allowed more hardware to distribute it over. People are saying "we cannot change the hardware so it's B", but B is a change of hardware to TPU anyway, which would require a code change, at which point D would be implemented anyway.

upvoted 3 times

...

MultiCloudIronMan

11 months, 2 weeks ago

Selected Answer: D

I have seen two or even 3 of this question and there are strong debates on the answer, I want to suggest D, because Yes, distributed training can work with your setup of 32 cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. However, the efficiency and performance will depend on the specific framework and strategy you use. The important thing about this answer is that it does not affect quality

upvoted 2 times

...

Jason_Cloud_at

11 months, 3 weeks ago

Selected Answer: B

in the question it says 3 Million xrays each with 2 GB , it will round upto 6M in size, TPU are exactly designed to accelerate ML tasks and it does massive parallelism, so i would go with B , i would directly omit A , C coz it is more about preventing and not directly aimed at reducing downtime, D is viable solution but comapring with B it is not.

upvoted 2 times

...

dija123

1 year, 1 month ago

Selected Answer: B

Agree with B

upvoted 2 times

...

inc_dev_ml_001

1 year, 3 months ago

Selected Answer: B

I would say B: A. Increse memory doesn't mean necessary a speed up of the process, it's not a batch-size problem B. It seems a image -> Tensorflow situation. So transforming image into tensors means that a TPU works better and maybe faster C. It's not a overfitting problem D. Same here, it's not a memory or input-size problem

upvoted 3 times

...

pinimichele01

1 year, 3 months ago

https://www.tensorflow.org/guide/distributed_training#onedevicestrategy

upvoted 1 times

pinimichele01

1 year, 3 months ago

https://www.tensorflow.org/guide/distributed_training#onedevicestrategy -> D

upvoted 1 times

...

Werner123

1 year, 5 months ago

Selected Answer: D

In my eyes the only solution is distributed training. 3 000 000 x 2GB = 6 Petabytes worth of data. No single device will get you there.

upvoted 3 times

...

ludovikush

1 year, 5 months ago

Selected Answer: B

Agree with JamesDoes

upvoted 2 times

...

Mickey321

1 year, 9 months ago

Selected Answer: B

B as it have only one GPU hence in D distributed not efficient

upvoted 4 times

...

pico

1 year, 9 months ago

f the question didn't specify the framework used, and you want to choose an option that is more framework-agnostic, it's important to consider the available options. Given the context and the need for a framework-agnostic approach, you might consider a combination of options A and D. Increasing instance memory and batch size can still be beneficial, and if you're using a deep learning framework that supports distributed training (like TensorFlow or PyTorch), implementing distributed training (Option D) can further accelerate the process.

upvoted 1 times

...

Krish6488

1 year, 9 months ago

Selected Answer: B

I would go with B as v3-32 TPU offers much more computational power than a single P100 GPU, and this upgrade should provide a substantial decrease in training time. Also tf.distributestrategy is good to perform distreibuted training on multiple GPUs or TPUs but the current setup has just one GPU which makes it the second best option provided the architecture uses multiple GPUs. Increase in memory may allow large batch size but wont address the fundamental problem which is over utilised GPU Early stopping is good for avoiding overfitting when model already starts performing at its best. Its good to reduce overall training time but wont improve the training speed

upvoted 5 times

...

Load full discussion...