exam questions

Exam Professional Cloud Architect All Questions

View all questions & answers for the Professional Cloud Architect exam

Exam Professional Cloud Architect topic 1 question 119 discussion

Actual exam question from Google's Professional Cloud Architect
Question #: 119
Topic #: 1
[All Professional Cloud Architect Questions]

You need to migrate Hadoop jobs for your company's Data Science team without modifying the underlying infrastructure. You want to minimize costs and infrastructure management effort. What should you do?

  • A. Create a Dataproc cluster using standard worker instances.
  • B. Create a Dataproc cluster using preemptible worker instances.
  • C. Manually deploy a Hadoop cluster on Compute Engine using standard instances.
  • D. Manually deploy a Hadoop cluster on Compute Engine using preemptible instances.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
TotoroChina
Highly Voted 3 years, 10 months ago
Should be B, you want to minimize costs. https://cloud.google.com/dataproc/docs/concepts/compute/secondary-vms#preemptible_and_non-preemptible_secondary_workers
upvoted 68 times
J19G
3 years, 6 months ago
Agree, the migration guide also recommends to think about preemptible worker nodes: https://cloud.google.com/architecture/hadoop/hadoop-gcp-migration-jobs#using_preemptible_worker_nodes
upvoted 4 times
ale_brd_111
2 years, 5 months ago
I think it's A. The question does not mention anything about minimize the costs, all the questions in GCP exams that require minimize the costs as requirement literally mention that in the question. Also in order to minimize the costs you need to build jobs that are fault tolerant, as workers instances are preemptible. This also requires some kind of Dev investment of work. So if not mentioned in the question fault tolerant and minimize costs then is not required/needed. Doc states below: Only use preemptible nodes for jobs that are fault-tolerant or that are low enough priority that occasional job failure won't disrupt your business.
upvoted 2 times
Amrx
6 months, 1 week ago
Question literally says "You want to minimize costs and infrastructure management effort."
upvoted 2 times
...
grejao
2 years ago
OMG, you again? zetalexg says: It's dissapointing that you waste your time writting on this topic instead of paying attention at the questions.
upvoted 6 times
...
...
...
XDevX
3 years, 10 months ago
Hi TotoroChina, I had the same thought when I first read the question - the problem I see is, in real business I think you would try to mix preemtible instances and on-demand instances... Here you have to choose between only preemtible instances and on-demand instances... Preemptible instances have some downsides - so we would need more details and ideally a mixed approach. That's why both answers might be correcy, a) and b)... Do you see that different? Thanks! Cheers, D.
upvoted 5 times
kopper2019
3 years, 9 months ago
but you need to reduce management overhead so B if you create a cluster manually and create and maintain GCE is not the way to go
upvoted 5 times
HenkH
2 years, 5 months ago
B requires to create new instances at least every 24h.
upvoted 2 times
...
...
...
Sukon_Desknot
2 years, 6 months ago
"without modifying the underlying infrastructure" is the watch word. Most likely did not utilize preemptible on-premises
upvoted 8 times
...
Yogi42
2 years, 3 months ago
A cost-savings consideration: Using preemptible VMs does not always save costs since preemptions can cause longer job execution with resulting higher job costs. This is mentioned in above link So I think Ans should be A
upvoted 3 times
...
...
firecloud
Highly Voted 3 years, 9 months ago
It's A, the primary workers can only be standard, where secondary workers can be preemtible.------In addition to using standard Compute Engine VMs as Dataproc workers (called "primary" workers), Dataproc clusters can use "secondary" workers. There are two types of secondary workers: preemptible and non-preemptible. All secondary workers in your cluster must be of the same type, either preemptible or non-preemptible. The default is preemptible.
upvoted 35 times
Manh
3 years, 7 months ago
agreed
upvoted 2 times
...
...
david_tay
Most Recent 2 months, 1 week ago
Selected Answer: B
it mentioned cost saving, hence B is fine as Hadoop is usually used for batch processing so a bit of downtime is tolerable.
upvoted 2 times
...
3a7557a
2 months, 3 weeks ago
Selected Answer: A
A. Create a Dataproc cluster using standard worker instances. you can use premtible instnace however should be less then 50% totaak wroker instances... could impact staiblirty... cons transient task failures. follow best practice: General Recommendations: Start Conservatively: Begin with a smaller percentage of preemptible workers (e.g., less than 30% of the total worker nodes). Monitor your jobs closely to see how they perform and adjust as needed. Don't Exceed 50%: Google generally recommends keeping the number of preemptible workers below 50% of the total worker nodes in your cluster. 1 This helps maintain stability and reduces the risk of significant job disruptions.
upvoted 1 times
...
plumbig11
3 months, 4 weeks ago
Selected Answer: B
Dataproc, as they want to save costs, preemptible is the best option, for sure.
upvoted 2 times
...
Lestrang
7 months ago
Probably A - Primary workers must be standard - Preemptible doesn't always save cost - Infrastructure on prem doesn't have spot machines - You cannot choose spot/preemptible when creating the cluster, only when provisioning secondary nodes, which are actually preemptible by default. - They do not store data, only do data processing
upvoted 1 times
...
pcamaster
7 months ago
Selected Answer: B
B: being a multiple-choice question, we need to focus on explicit keywords here. Management effort => Managed service -> Dataproc. Cost-optimization => Preemptible. For ones who say "but that also requires fault toleration": well, there is no explicit keyword in the question says "we have critical jobs" or "out data scientists team has not takes into account toleration". So we must not assume that's needed.
upvoted 2 times
...
afsarkhan
9 months, 3 weeks ago
Selected Answer: A
I will go with A , reason preemptible instances are unpredictable and there is no mention of work criticallity. So my answer is A against B
upvoted 1 times
...
Gino17m
1 year ago
Selected Answer: A
A - only secondary workers can be preemptible and "Using preemptible VMs does not always save costs since preemptions can cause longer job execution with resulting higher job costs" according to: https://cloud.google.com/dataproc/docs/concepts/compute/secondary-vms#preemptible_and_non-preemptible_secondary_workers
upvoted 1 times
...
dija123
1 year ago
Selected Answer: B
Agree with B
upvoted 2 times
...
Diwz
1 year, 1 month ago
Answer is B. The secondary worker type instance for default Dataproc cluster is preemptible VMs. https://cloud.google.com/dataproc/docs/concepts/compute/secondary-vms
upvoted 1 times
...
shashii82
1 year, 1 month ago
Dataproc: Dataproc is a fully managed Apache Spark and Hadoop service on Google Cloud Platform. It allows you to run clusters without the need to manually deploy and manage Hadoop clusters on Compute Engine. Preemptible Worker Instances: Preemptible instances are short-lived, cost-effective virtual machine instances that are suitable for fault-tolerant and batch processing workloads. Since Hadoop jobs can often tolerate interruptions, using preemptible instances can significantly reduce costs. Option B leverages the benefits of Dataproc for managing Hadoop clusters without the need for manual deployment and takes advantage of preemptible instances to minimize costs. This aligns well with the goal of minimizing both costs and infrastructure management efforts.
upvoted 1 times
...
VidhyaBupesh
1 year, 2 months ago
Using preemptible VMs does not always save costs since preemptions can cause longer job execution with resulting higher job costs
upvoted 1 times
...
Amrita2012
1 year, 2 months ago
Selected Answer: A
Using standard Compute Engine VMs as Dataproc workers (called "primary" workers), Preemptible can be only used for secondary workers hence A is valid answer
upvoted 1 times
...
Pime13
1 year, 2 months ago
Selected Answer: B
minimize costs -> preemtipble
upvoted 3 times
...
d0094d6
1 year, 2 months ago
Selected Answer: B
You want to minimize costs and infrastructure management effort > B
upvoted 3 times
...
d0094d6
1 year, 2 months ago
"You want to minimize costs and infrastructure management effort" -> B
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago