Exam Professional Cloud Architect All Questions

View all questions & answers for the Professional Cloud Architect exam

Exam Professional Cloud Architect topic 8 question 8 discussion

Actual exam question from Google's Professional Cloud Architect

Question #: 8
Topic #: 8

[All Professional Cloud Architect Questions]

TerramEarth's 20 million vehicles are scattered around the world. Based on the vehicle's location, its telemetry data is stored in a Google Cloud Storage (GCS) regional bucket (US, Europe, or Asia). The CTO has asked you to run a report on the raw telemetry data to determine why vehicles are breaking down after 100 K miles. You want to run this job on all the data.
What is the most cost-effective way to run this job?

A. Move all the data into 1 zone, then launch a Cloud Dataproc cluster to run the job
B. Move all the data into 1 region, then launch a Google Cloud Dataproc cluster to run the job
C. Launch a cluster in each region to preprocess and compress the raw data, then move the data into a multi-region bucket and use a Dataproc cluster to finish the job
D. Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region bucket and use a Cloud Dataproc cluster to finish the job

Show Suggested Answer

Suggested Answer: D 🗳️

by JoeShmoe at Nov. 15, 2019, 10:29 a.m.

Comments

Submit Cancel

cetanx

Highly Voted 3 years, 11 months ago

I will look at it from a different perspective; A, B says "move all data" but analysis will try to reveal breaking down after 100K miles so there is no point of transferring data of the vehicles with less than 100K milage. Therefore, transferring all data is just waste of time and money. There is one thing for sure here. If we move/copy data between continents it will cost us money therefore compressing the data before copying to another region/continent makes sense. Preprocessing also makes sense because we probably want to process smaller chunks of data first (remember 100K milage). So now type of target bucket; multi-region or standard? multi-region is good for high-availability and low latency with a little more cost however question doesn't require any of these features. Therefore I think standard storage option is good to go given lower costs are always better. So my answer would be D

upvoted 67 times

DiegoQ

3 years, 9 months ago

I totally agree with you, and I think that what confuse people here is the "run a raw data", but preprocess doesn´t mean to mandatory transform raw data, it could be to only select the data that you need (as you said: vehicles with less than 100K milage)

upvoted 2 times

...

mrhege

3 years, 2 months ago

You will need data from non-broken machines too for labelling.

upvoted 1 times

stfnz

1 year, 1 month ago

yes, still you will be interested in 100K+ mileage, whether broken or not

upvoted 1 times

...

JoeShmoe

Highly Voted 4 years, 7 months ago

D is the most cost effective and DataProc is regional

upvoted 32 times

nitinz

3 years, 3 months ago

It is D.

upvoted 1 times

...

Rafaa

4 years ago

Hold on guys, you do not need to 'preprocess' the data. This rules out C,D.

upvoted 2 times

guid1984

3 years, 4 months ago

why not it's a RAW data, so can be pre-processed for optimization

upvoted 2 times

...

passnow

4 years, 6 months ago

Dataproc can be use global end points too.

upvoted 1 times

tartar

3 years, 10 months ago

D is ok

upvoted 11 times

...

passnow

4 years, 6 months ago

Honestly, if we read the question well and factor in cost, D would be a better option

upvoted 2 times

vindahake

4 years, 3 months ago

I think running additional compute regionally will be more expensive than data transfer charges and centrally processing them

upvoted 4 times

...

msahdra

Most Recent 7 months ago

Selected Answer: C

While regional preprocessing can be efficient, moving the data back to regional buckets after compression defeats the purpose of a multi-region bucket. It adds unnecessary data transfer costs and reduces the availability of the preprocessed data for global analysis.

upvoted 2 times

...

thewalker

7 months, 3 weeks ago

D Considering https://cloud.google.com/storage/docs/locations#considerations

upvoted 2 times

...

Jeena345

1 year, 5 months ago

Selected Answer: D

D should be fine

upvoted 1 times

...

omermahgoub

1 year, 6 months ago

Answer is C To run the report on all of the raw telemetry data for TerramEarth's vehicles in the most cost-effective way, it would be best to launch a cluster in each region to preprocess and compress the raw data. This will allow you to process the data in place, which will minimize the amount of data that needs to be transferred between regions. After the data has been preprocessed and compressed, you can then move it into a multi-region bucket and use a Dataproc cluster to finish the job.

upvoted 2 times

omermahgoub

1 year, 6 months ago

D, moving the data into a region bucket and using a Cloud Dataproc cluster to finish the job, would also not be as cost-effective as moving the data into a multi-region bucket, as it would not take advantage of the lower costs of storing data in a multi-region bucket.

upvoted 1 times

...

megumin

1 year, 7 months ago

Selected Answer: D

ok for D

upvoted 1 times

...

Mahmoud_E

1 year, 8 months ago

Selected Answer: D

D seems better

upvoted 1 times

...

AMohanty

1 year, 10 months ago

What is the use of Multi-Regional DataProc if ur Storage Data is Regional

upvoted 2 times

...

AzureDP900

1 year, 12 months ago

D is fine, There is no need of multi-region as mentioned in C. D is right in my opinion.

upvoted 2 times

...

vincy2202

2 years, 6 months ago

Selected Answer: D

D is the correct answer. Regional bucket is required, since multi regional bucket will incur additional cost to transfer the data to a centralized location.

upvoted 2 times

...

vincy2202

2 years, 6 months ago

D seems to be the correct answer

upvoted 1 times

...

joe2211

2 years, 7 months ago

Selected Answer: D

vote D

upvoted 2 times

...

MaxNRG

2 years, 8 months ago

D – Launch a cluster in each region to pre-process and compress the raw data, then move the data into a regional bucket and use Cloud Dataproc cluster. Egress rates are most important. It is free inside of region - so make sense to move all data into one region for processing/performance (from all continents). Cross-region cost is 0.01$ per GB, and inter-continent 0.12$ per GB. If to consider just option B (moving all raw data into one region) then just monthly volume would cost: 900 TB (all 20M units daily) 30 days 0.12 $ = 3.24 M $ (just for data transfer). So, it definitely makes sense to preprocess/compress data per region, and then move all that data into one region for final analysis. That would save up to 10-100 times on egress costs. Also, important aspect is processing time - running it in parallel on all regions accelerates overall analysis effort. Faster result - faster in-field improvements. Look this interesting video about price optimization in GCP (first 11.5 mins are about Storage/Network) https://cloud.google.com/storage/docs/locations#considerations

upvoted 6 times

...

victory108

2 years, 11 months ago

D. Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region bucket and use a Cloud Dataproc cluster to finish the job

upvoted 1 times

...

MamthaSJ

2 years, 12 months ago

Answer is D

upvoted 3 times

...

Yogikant

3 years ago

Answer D: moving data from one region to another region will incur network egress cost. By compressing data and then moving would reduce this cost. Though running Dataproc for preprocessing in each region will incur additional cost but it will also reduce cost of running Dataproc job on all pre-processed data will also reduce cost offsetting additional cost of Dataproc cluster at regional level.

upvoted 1 times

...

Load full discussion...