exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 38 discussion

A financial company uses Apache Hive on Amazon EMR for ad-hoc queries. Users are complaining of sluggish performance.
A data analyst notes the following:
✑ Approximately 90% of queries are submitted 1 hour after the market opens.
Hadoop Distributed File System (HDFS) utilization never exceeds 10%.

Which solution would help address the performance issues?

  • A. Create instance fleet configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch CapacityRemainingGB metric. Create an automatic scaling policy to scale in the instance fleet based on the CloudWatch CapacityRemainingGB metric.
  • B. Create instance fleet configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch YARNMemoryAvailablePercentage metric. Create an automatic scaling policy to scale in the instance fleet based on the CloudWatch YARNMemoryAvailablePercentage metric.
  • C. Create instance group configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch CapacityRemainingGB metric. Create an automatic scaling policy to scale in the instance groups based on the CloudWatch CapacityRemainingGB metric.
  • D. Create instance group configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch YARNMemoryAvailablePercentage metric. Create an automatic scaling policy to scale in the instance groups based on the CloudWatch YARNMemoryAvailablePercentage metric.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Shraddha
Highly Voted 3 years, 6 months ago
Ans D A and B = wrong, instance fleet does not support auto scaling. C = wrong, HDFS utilization never exceeds 10% no scaling will never happen.
upvoted 23 times
lakediver
3 years, 5 months ago
The following are two commonly used metrics for automatic scaling: YarnMemoryAvailablePercentage: This is the percentage of remaining memory that's available for YARN. ContainerPendingRatio: This is the ratio of pending containers to allocated containers. You can use this metric to scale a cluster based on container-allocation behavior for varied loads. This is useful for performance tuning.
upvoted 3 times
...
lakediver
3 years, 5 months ago
Agree For further reference see https://aws.amazon.com/premiumsupport/knowledge-center/auto-scaling-in-amazon-emr/
upvoted 2 times
...
...
ariane_tateishi
Highly Voted 3 years, 6 months ago
D should be the right answer. Considering the following links: the first link is possible to see that the right metric to this requirement is YARNMemoryAvailablePercentage, because the HDFS never is over 10%. The second link explain that if you will use auto scaling so you should use instance group. https://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html
upvoted 7 times
...
monkeydba
Most Recent 1 year, 6 months ago
"Managed scaling is available for clusters composed of either instance groups or instance fleets." https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-scaling.html#:~:text=Managed%20scaling%20is%20available%20for%20clusters%20composed%20of%20either%20instance%20groups%20or%20instance%20fleets.
upvoted 1 times
...
pk349
2 years ago
D: I passed the test
upvoted 1 times
...
srirnag
2 years, 3 months ago
YARNMemoryAvailablePercentage-> is for CPU intensive workload, CapacityRemainingGB-> for Capacity intensive workload, Instance fleet is ruled out. Hence, D
upvoted 1 times
...
cloudlearnerhere
2 years, 6 months ago
Selected Answer: D
Correct answer is D as instance group configurations for core and task nodes can be used to scale as per the YARNMemoryAvailablePercentage metric. options A & B are incorrect because an Instance Fleet doesn’t have an automatic scaling policy. Only an Instance Group has this feature. Option C is incorrect as the CapacityRemainingGB metric is just the amount of remaining HDFS disk capacity and this does not exceed 10% for each run. The cluster will not scale-in or scale-out if you choose this metric.
upvoted 4 times
cloudlearnerhere
2 years, 6 months ago
CloudWatch metrics that you can use for automatic scaling in Amazon EMR, The following are two commonly used metrics for automatic scaling: YarnMemoryAvailablePercentage: This is the percentage of remaining memory that's available for YARN. ContainerPendingRatio: This is the ratio of pending containers to allocated containers. You can use this metric to scale a cluster based on container-allocation behavior for varied loads. This is useful for performance tuning. For the given use case, the correct solution should support automatic scaling. You can set up automatic scaling in Amazon EMR for an instance group, adding and removing instances automatically based on the value of an Amazon CloudWatch metric that you specify. The metric YARNMemoryAvailablePercentage represents the percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB). This value is useful for scaling cluster resources based on YARN memory usage.
upvoted 2 times
...
...
Arka_01
2 years, 8 months ago
Selected Answer: D
Instance Fleet cannot take part in Auto-Scaling. CapacityRemainingGB is not the parameter to refer as "(HDFS) utilization never exceeds 10%". So the answer is D.
upvoted 1 times
...
rocky48
2 years, 10 months ago
Selected Answer: D
Selected Answer: D
upvoted 1 times
...
Ramshizzle
2 years, 11 months ago
Answer should be D like others have said. However, I think it would be even better to use Instance fleets and EMR Managed auto scaling, but this is not an option here.
upvoted 1 times
...
Bik000
3 years ago
Selected Answer: D
Answer is D
upvoted 1 times
...
jrheen
3 years ago
Answer-D
upvoted 1 times
...
ShilaP
3 years, 2 months ago
D is the right answer.
upvoted 1 times
...
aws2019
3 years, 6 months ago
Option D is the right choice.
upvoted 1 times
...
Billhardy
3 years, 6 months ago
Ans D
upvoted 1 times
...
Naresh_Dulam
3 years, 7 months ago
Answer is D over B. Because Spot instance fleet support "managed" auto scaling and managed auto scaling can't use Cloud watch metric like YARNMemoryAvailablePercentage. Managed auto scaling scaled depends load on the cluster.
upvoted 4 times
...
lostsoul07
3 years, 7 months ago
D is the right answer
upvoted 1 times
...
BillyC
3 years, 7 months ago
D IS Correct for my
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...