exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 3 discussion

A real estate company has a mission-critical application using Apache HBase in Amazon EMR. Amazon EMR is configured with a single master node. The company has over 5 TB of data stored on an Hadoop Distributed File System (HDFS). The company wants a cost-effective solution to make its HBase data highly available.
Which architectural pattern meets company's requirements?

  • A. Use Spot Instances for core and task nodes and a Reserved Instance for the EMR master node. Configure the EMR cluster with multiple master nodes. Schedule automated snapshots using Amazon EventBridge.
  • B. Store the data on an EMR File System (EMRFS) instead of HDFS. Enable EMRFS consistent view. Create an EMR HBase cluster with multiple master nodes. Point the HBase root directory to an Amazon S3 bucket.
  • C. Store the data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent view. Run two separate EMR clusters in two different Availability Zones. Point both clusters to the same HBase root directory in the same Amazon S3 bucket.
  • D. Store the data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent view. Create a primary EMR HBase cluster with multiple master nodes. Create a secondary EMR HBase read-replica cluster in a separate Availability Zone. Point both clusters to the same HBase root directory in the same Amazon S3 bucket.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cloudlearnerhere
Highly Voted 2 years, 6 months ago
Selected Answer: D
D is correct as Amazon EMR version 5.7.0 or later, you can set up a read-replica cluster, which allows you to maintain read-only copies of data in Amazon S3. In the event that the primary cluster becomes unavailable, you can access the data from the read-replica cluster to perform read operations simultaneously. A is incorrect because using Spot EC2 instances for both of your core and task nodes could potentially cause downtime. Although this solution is the most cost-effective, it certainly doesn’t provide the highest availability for Amazon EMR. B is incorrect. While an EMR cluster with multiple master nodes can survive scenarios in which a primary master node fails, it is not, however, tolerant of Availability Zone failures. C is wrong as It's not possible for two primary clusters to be linked to the same root directory at the same time. Take note that only one active cluster at a time can use the same HBase root directory in Amazon S3. The best way to implement this is to launch a primary EMR cluster and a secondary (read-replica) EMR cluster, since using two primary clusters is not supported.
upvoted 36 times
henom
2 years, 4 months ago
The answer is D. Udemy course by Bonso has the same Logic.
upvoted 5 times
...
...
dushmantha
Highly Voted 2 years, 10 months ago
Selected Answer: B
If we strictly want high availability then answer should be "D". But to be cost effective it only needs to go from current HDFS to S3, to make the data more available than before. Read replica is the next step if we want availability over master node crashes, etc. And it comes with additional cost. So I also suggest ans "B"
upvoted 14 times
...
[Removed]
Most Recent 8 months, 1 week ago
Option D provides a robust and cost-effective solution that meets the company's requirements for making its HBase data highly available while leveraging Amazon EMR's capabilities effectively.
upvoted 10 times
...
NarenKA
1 year, 2 months ago
Selected Answer: D
Option D provides a robust and cost-effective solution that meets the company's requirements for making its HBase data highly available while leveraging Amazon EMR's capabilities effectively.
upvoted 3 times
...
kondi2309
1 year, 2 months ago
Selected Answer: D
the answer here is D.
upvoted 3 times
...
joselopezjm
1 year, 3 months ago
Selected Answer: D
D because it requires HA deploying two different AZ
upvoted 2 times
...
gofavad926
1 year, 7 months ago
Selected Answer: D
as cloudlearnerhere explains ""D is correct as Amazon EMR version 5.7.0 or later, you can set up a read-replica cluster, which allows you to maintain read-only copies of data in Amazon S3. In the event that the primary cluster becomes unavailable, you can access the data from the read-replica cluster to perform read operations simultaneously"
upvoted 3 times
...
NikkyDicky
1 year, 9 months ago
Selected Answer: B
B for cost Hbase data availability is satisfied by EMRFS
upvoted 1 times
...
pk349
2 years ago
D: I passed the test
upvoted 3 times
Shaggy_98
1 year, 4 months ago
Did you clear using this dumps ?
upvoted 1 times
...
...
anjuvinayan
2 years ago
EMR is Single availability zone cluster which means we need to setup cluster in different avz for high availability. Two primary cluster is not an option. So answer is D
upvoted 2 times
...
kozer
2 years ago
this recent aws documentation stateshttps://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-plan-consistent-view.html indicates consistent views are not supported and is not needed since 2020 . So yes D seems accurate or best answer but these questions are outdated and given how fast features change in AWS , this question certainly would be worded differently .
upvoted 2 times
...
bjmailbox
2 years, 1 month ago
Has to be option B , because it says HBASE data to be highly available which is already satisfied by EMRFS. It doesn't talk about cluster availability directly anywhere also considering the costs option D can be eliminated compared to B.
upvoted 1 times
...
rit25
2 years, 1 month ago
Just to tell you why not B. Enabling EMRFS consistent view and pointing the HBase root directory to an Amazon S3 bucket are two different concepts, but they are related in this scenario. EMRFS (EMR File System) is a file system interface that allows EMR clusters to access data stored in Amazon S3 in the same way as data stored on HDFS. By enabling EMRFS consistent view, EMR ensures that all nodes in the cluster see a consistent view of data stored in S3, which is important for applications like HBase that require strong consistency. On the other hand, pointing the HBase root directory to an S3 bucket means that HBase tables and metadata are stored in S3, rather than on HDFS. This allows HBase to take advantage of the durability and scalability of S3, while still providing low-latency access to data. So, in option B, the company is using both EMRFS and S3. EMRFS is used to provide a consistent view of data stored in S3, while HBase is configured to store its tables and metadata in S3.
upvoted 1 times
...
AwsNewPeople
2 years, 2 months ago
D. Store the data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent view. Create a primary EMR HBase cluster with multiple master nodes. Create a secondary EMR HBase read-replica cluster in a separate Availability Zone. Point both clusters to the same HBase root directory in the same Amazon S3 bucket. This solution provides a cost-effective way to make HBase data highly available by creating a primary EMR HBase cluster with multiple master nodes and a secondary EMR HBase read-replica cluster in a separate Availability Zone. By storing data on EMRFS and enabling EMRFS consistent view, both clusters can access the same data stored on an Amazon S3 bucket. This eliminates the need to store data redundantly and reduces costs. The use of multiple master nodes improves HBase availability and reliability. If the primary cluster fails, the secondary read-replica cluster can continue to serve read traffic.
upvoted 3 times
...
Matheus_Sampaio
2 years, 3 months ago
Selected Answer: D
D based on Bonso Udemy Course
upvoted 3 times
...
Vskvar
2 years, 5 months ago
Selected Answer: B
Data highly available. D is not cost effective.
upvoted 1 times
...
Kako
2 years, 6 months ago
Selected Answer: D
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago