Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam AWS Certified Solutions Architect - Professional SAP-C02 topic 1 question 263 discussion

A solutions architect needs to review the design of an Amazon EMR cluster that is using the EMR File System (EMRFS). The cluster performs tasks that are critical to business needs. The cluster is running Amazon EC2 On-Demand Instances at all times for all task, primary, and core nodes. The EMR tasks run each morning, starting at 1:00 AM. and take 6 hours to finish running. The amount of time to complete the processing is not a priority because the data is not referenced until late in the day.

The solutions architect must review the architecture and suggest a solution to minimize the compute costs.

Which solution should the solutions architect recommend to meet these requirements?

  • A. Launch all task, primary, and core nodes on Spot Instances in an instance fleet. Terminate the cluster, including all instances, when the processing is completed.
  • B. Launch the primary and core nodes on On-Demand Instances. Launch the task nodes on Spot Instances in an instance fleet. Terminate the cluster, including all instances, when the processing is completed. Purchase Compute Savings Plans to cover the On-Demand Instance usage.
  • C. Continue to launch all nodes on On-Demand Instances. Terminate the cluster, including all instances, when the processing is completed. Purchase Compute Savings Plans to cover the On-Demand Instance usage.
  • D. Launch the primary and core nodes on On-Demand Instances. Launch the task nodes on Spot Instances in an instance fleet. Terminate only the task node instances when the processing is completed. Purchase Compute Savings Plans to cover the On-Demand Instance usage.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
aviathor
Highly Voted 8 months, 3 weeks ago
Selected Answer: D
The problem statement says: "The EMR tasks run each morning, starting at 1:00 AM. and take 6 hours to finish running. The amount of time to complete the processing is not a priority because *the data is not referenced until late in the day.*" So later in the day, clients will be using the cluster to read data. Therefore my understanding is that core and primary nodes need to be available, but the task nodes can be terminated once the tasks have finished their daily run.
upvoted 19 times
...
javitech83
Highly Voted 10 months, 3 weeks ago
Selected Answer: D
Correct Answer is D. In B it has no sense to temrinate primary instance if we have already purchase a saving plan.
upvoted 11 times
...
seetpt
Most Recent 2 weeks, 1 day ago
Selected Answer: D
D for me
upvoted 1 times
...
43c89f4
2 weeks, 3 days ago
B - we should not terminate the cluster. D - once task is done can terminate the node. so my answer is D
upvoted 1 times
...
TonytheTiger
1 month ago
Selected Answer: D
Option D: How To / Use Case https://aws.amazon.com/blogs/big-data/strategies-for-reducing-your-amazon-emr-costs/
upvoted 2 times
...
Keval12345
1 month, 1 week ago
Selected Answer: D
Terminating all instances make sense as these are not frequent jobs. They are run on once a day https://www.cloudforecast.io/blog/aws-emr-cost-optimization-guide/
upvoted 2 times
...
pangchn
1 month, 1 week ago
Selected Answer: D
D for the one who chose B, the computer savings plan is a hourly commitment for consistent usage pattern. You will be charged even you shutdown the whole stack
upvoted 2 times
...
yog927
1 month, 3 weeks ago
Selected Answer: B
We can terminate the cluster and then read results from S3. Refer below EMR faq: Q: How does Amazon EMR use Amazon EC2 and Amazon S3? You can upload your input data and a data processing application into Amazon S3. Amazon EMR then launches a number of Amazon EC2 instances that you specified. The service begins the cluster execution while pulling the input data from Amazon S3 using S3 URI scheme into the launched Amazon EC2 instances. Once the cluster is finished, Amazon EMR transfers the output data to Amazon S3, where you can then retrieve it or use as input in another cluster. https://aws.amazon.com/emr/faqs/
upvoted 1 times
...
Dgix
1 month, 4 weeks ago
Selected Answer: B
We _can_ terminate the entire cluster, as EMRFS is specified – which stores the computational results in S3. Therefore, the cluster is not required after processing.
upvoted 1 times
...
career360guru
2 months, 1 week ago
Selected Answer: D
Option D because processed data is used later in the day.
upvoted 2 times
...
a54b16f
2 months, 2 weeks ago
Selected Answer: D
The difference between D and B is that whether to terminate whole EMR cluster, or do we need the EMR cluster after the 6 hour processing. The answer is yes, " the data is not referenced until late in the day" , EMRFS can't be access without EMR cluster. You may argue that you can access the underlying s3 directly. But, you would loss the benefits of EMR/EMRFS, which provide security control, and most importantly, performance and system throughput related to big data
upvoted 3 times
...
sat2008
2 months, 3 weeks ago
Selected Answer: B
Once the Amazon EMR cluster completes processing data in S3 why do you need it ? Does processed data stored on cluster EC2s . There is a specific settings The auto-termination policy terminates the cluster after a specific amount of idle time. You will not need the cluster until the next run .
upvoted 1 times
...
sat2008
2 months, 3 weeks ago
Selected Answer: B
Once the Data process is complete is there a need for EMR Cluster ? you can use The auto-termination policy terminates the cluster after a specific amount of idle time. The processed data is in S3 for later queries so my thoughts would be do no need to EMR Cluster till the next run .
upvoted 2 times
...
chelbsik
3 months, 2 weeks ago
Selected Answer: D
D Makes no sense to kill the whole cluster when someone would access it later same day
upvoted 4 times
...
ele
3 months, 3 weeks ago
Selected Answer: B
B is the right answer. Transient clusters is the best suits for this use case. Data processes by EMR stored in s3, and referenced there. No need to keep any nodes up. Besides, EMR File System (EMRFS) is best suited for transient clusters as the data resides irrespective of the lifetime of the cluster.
upvoted 1 times
...
duriselvan
5 months, 1 week ago
d ANS Cost optimization: Using Spot Instances for task nodes significantly reduces costs compared to On-Demand Instances. Spot Instances can offer substantial discounts, especially when running workloads with flexible start and stop times. Minimal impact: By terminating only the task nodes after processing, the primary and core nodes remain available for future job submissions without requiring a complete cluster restart. This minimizes downtime and maximizes resource utilization. Availability and stability: On-Demand Instances for primary and core nodes ensure high availability and stability for critical tasks. This eliminates the risk of interruptions due to Spot Instance price fluctuations or availability constraints. Savings Plans: Purchasing Compute Savings Plans for On-Demand Instances can provide further cost savings by offering discounts based on a committed level of usage.
upvoted 2 times
...
ayadmawla
5 months, 1 week ago
Selected Answer: B
D is appealing and makes sense due to the indicated critical nature of the cluster. B however is associated with EMRFS (S3) which is typically used with transient EMR Cluster (see: https://bluexp.netapp.com/blog/optimizing-aws-emr-best-practices) Since the objective is to save money, then terminating the cluster, and cloning its configuration to launch a new one on a daily basis only takes a few minutes would be an appropriate option. just my two pennies worth :)
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...