exam questions

Exam Professional Cloud Architect All Questions

View all questions & answers for the Professional Cloud Architect exam

Exam Professional Cloud Architect topic 8 question 8 discussion

Actual exam question from Google's Professional Cloud Architect
Question #: 8
Topic #: 8
[All Professional Cloud Architect Questions]

TerramEarth's 20 million vehicles are scattered around the world. Based on the vehicle's location, its telemetry data is stored in a Google Cloud Storage (GCS) regional bucket (US, Europe, or Asia). The CTO has asked you to run a report on the raw telemetry data to determine why vehicles are breaking down after 100 K miles. You want to run this job on all the data.
What is the most cost-effective way to run this job?

  • A. Move all the data into 1 zone, then launch a Cloud Dataproc cluster to run the job
  • B. Move all the data into 1 region, then launch a Google Cloud Dataproc cluster to run the job
  • C. Launch a cluster in each region to preprocess and compress the raw data, then move the data into a multi-region bucket and use a Dataproc cluster to finish the job
  • D. Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region bucket and use a Cloud Dataproc cluster to finish the job
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cetanx
Highly Voted 3 years, 9 months ago
I will look at it from a different perspective; A, B says "move all data" but analysis will try to reveal breaking down after 100K miles so there is no point of transferring data of the vehicles with less than 100K milage. Therefore, transferring all data is just waste of time and money. There is one thing for sure here. If we move/copy data between continents it will cost us money therefore compressing the data before copying to another region/continent makes sense. Preprocessing also makes sense because we probably want to process smaller chunks of data first (remember 100K milage). So now type of target bucket; multi-region or standard? multi-region is good for high-availability and low latency with a little more cost however question doesn't require any of these features. Therefore I think standard storage option is good to go given lower costs are always better. So my answer would be D
upvoted 66 times
DiegoQ
3 years, 7 months ago
I totally agree with you, and I think that what confuse people here is the "run a raw data", but preprocess doesn´t mean to mandatory transform raw data, it could be to only select the data that you need (as you said: vehicles with less than 100K milage)
upvoted 2 times
...
mrhege
3 years ago
You will need data from non-broken machines too for labelling.
upvoted 1 times
stfnz
11 months, 2 weeks ago
yes, still you will be interested in 100K+ mileage, whether broken or not
upvoted 1 times
...
...
...
JoeShmoe
Highly Voted 4 years, 5 months ago
D is the most cost effective and DataProc is regional
upvoted 32 times
nitinz
3 years, 1 month ago
It is D.
upvoted 1 times
...
Rafaa
3 years, 10 months ago
Hold on guys, you do not need to 'preprocess' the data. This rules out C,D.
upvoted 2 times
guid1984
3 years, 2 months ago
why not it's a RAW data, so can be pre-processed for optimization
upvoted 2 times
...
...
passnow
4 years, 4 months ago
Dataproc can be use global end points too.
upvoted 1 times
tartar
3 years, 8 months ago
D is ok
upvoted 11 times
...
passnow
4 years, 4 months ago
Honestly, if we read the question well and factor in cost, D would be a better option
upvoted 2 times
vindahake
4 years, 1 month ago
I think running additional compute regionally will be more expensive than data transfer charges and centrally processing them
upvoted 4 times
...
...
...
...
msahdra
Most Recent 4 months, 4 weeks ago
Selected Answer: C
While regional preprocessing can be efficient, moving the data back to regional buckets after compression defeats the purpose of a multi-region bucket. It adds unnecessary data transfer costs and reduces the availability of the preprocessed data for global analysis.
upvoted 2 times
...
thewalker
5 months, 2 weeks ago
D Considering https://cloud.google.com/storage/docs/locations#considerations
upvoted 2 times
...
Jeena345
1 year, 2 months ago
Selected Answer: D
D should be fine
upvoted 1 times
...
omermahgoub
1 year, 4 months ago
Answer is C To run the report on all of the raw telemetry data for TerramEarth's vehicles in the most cost-effective way, it would be best to launch a cluster in each region to preprocess and compress the raw data. This will allow you to process the data in place, which will minimize the amount of data that needs to be transferred between regions. After the data has been preprocessed and compressed, you can then move it into a multi-region bucket and use a Dataproc cluster to finish the job.
upvoted 2 times
omermahgoub
1 year, 4 months ago
D, moving the data into a region bucket and using a Cloud Dataproc cluster to finish the job, would also not be as cost-effective as moving the data into a multi-region bucket, as it would not take advantage of the lower costs of storing data in a multi-region bucket.
upvoted 1 times
...
...
megumin
1 year, 5 months ago
Selected Answer: D
ok for D
upvoted 1 times
...
Mahmoud_E
1 year, 6 months ago
Selected Answer: D
D seems better
upvoted 1 times
...
AMohanty
1 year, 8 months ago
What is the use of Multi-Regional DataProc if ur Storage Data is Regional
upvoted 2 times
...
AzureDP900
1 year, 10 months ago
D is fine, There is no need of multi-region as mentioned in C. D is right in my opinion.
upvoted 2 times
...
vincy2202
2 years, 4 months ago
Selected Answer: D
D is the correct answer. Regional bucket is required, since multi regional bucket will incur additional cost to transfer the data to a centralized location.
upvoted 2 times
...
vincy2202
2 years, 4 months ago
D seems to be the correct answer
upvoted 1 times
...
joe2211
2 years, 5 months ago
Selected Answer: D
vote D
upvoted 2 times
...
MaxNRG
2 years, 6 months ago
D – Launch a cluster in each region to pre-process and compress the raw data, then move the data into a regional bucket and use Cloud Dataproc cluster. Egress rates are most important. It is free inside of region - so make sense to move all data into one region for processing/performance (from all continents). Cross-region cost is 0.01$ per GB, and inter-continent 0.12$ per GB. If to consider just option B (moving all raw data into one region) then just monthly volume would cost: 900 TB (all 20M units daily) 30 days 0.12 $ = 3.24 M $ (just for data transfer). So, it definitely makes sense to preprocess/compress data per region, and then move all that data into one region for final analysis. That would save up to 10-100 times on egress costs. Also, important aspect is processing time - running it in parallel on all regions accelerates overall analysis effort. Faster result - faster in-field improvements. Look this interesting video about price optimization in GCP (first 11.5 mins are about Storage/Network) https://cloud.google.com/storage/docs/locations#considerations
upvoted 6 times
...
victory108
2 years, 9 months ago
D. Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region bucket and use a Cloud Dataproc cluster to finish the job
upvoted 1 times
...
MamthaSJ
2 years, 9 months ago
Answer is D
upvoted 3 times
...
Yogikant
2 years, 11 months ago
Answer D: moving data from one region to another region will incur network egress cost. By compressing data and then moving would reduce this cost. Though running Dataproc for preprocessing in each region will incur additional cost but it will also reduce cost of running Dataproc job on all pre-processed data will also reduce cost offsetting additional cost of Dataproc cluster at regional level.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago