exam questions

Exam AWS Certified Big Data - Specialty All Questions

View all questions & answers for the AWS Certified Big Data - Specialty exam

Exam AWS Certified Big Data - Specialty topic 2 question 5 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty
Question #: 5
Topic #: 2
[All AWS Certified Big Data - Specialty Questions]

An organization is currently using an Amazon EMR long-running cluster with the latest Amazon EMR release for analytic jobs and is storing data as external tables on Amazon S3.
The company needs to launch multiple transient EMR clusters to access the same tables concurrently, but the metadata about the Amazon S3 external tables are defined and stored on the long-running cluster.
Which solution will expose the Hive metastore with the LEAST operational effort?

  • A. Export Hive metastore information to Amazon DynamoDB hive-site classification to point to the Amazon DynamoDB table.
  • B. Export Hive metastore information to a MySQL table on Amazon RDS and configure the Amazon EMR hive-site classification to point to the Amazon RDS database.
  • C. Launch an Amazon EC2 instance, install and configure Apache Derby, and export the Hive metastore information to derby.
  • D. Create and configure an AWS Glue Data Catalog as a Hive metastore for Amazon EMR.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Kuang
Highly Voted 3 years, 6 months ago
It is D. The key is "multiple transient EMR clusters to access the same tables concurrently". For External RDS metastore, it is not recommended to write concurrently.(https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-external.html) For Glue, it have 1 to 10 concurrent access.(https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html)
upvoted 8 times
...
ariane_tateishi
Most Recent 3 years, 6 months ago
D. Should be the right answer, because Amazon recommends use this configuration when you require a persistent metastore or a metastore shared by different clusters, services, applications, or AWS account. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html
upvoted 1 times
...
srirampc
3 years, 6 months ago
answer is D. Glue crawlers can update tables when schema changes, thus requiring the LEAST operational effort.
upvoted 1 times
...
YashBindlish
3 years, 6 months ago
Answer is D https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html
upvoted 2 times
...
san2020
3 years, 7 months ago
my selection B
upvoted 2 times
...
richardxyz
3 years, 7 months ago
D is correct; the question is about "which solution will expose the Hive metastore, ... defined on the long-running cluster"
upvoted 1 times
...
yuriy_ber
3 years, 7 months ago
well long-running cluster is already operating Hive catalog on in MySQL DB on Master Node so I think moving existing database to RDS and switching is more easy than using Glue Crawler, so B
upvoted 4 times
ME2000
3 years, 7 months ago
Here we go... Configuring an External Metastore for Hive https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-metastore-external-hive.html Using the AWS Glue Data Catalog as the Metastore for Hive https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html (Important point: If another cluster needs to access the table, it fails unless it has adequate permissions to the cluster that created the table. Furthermore, because HDFS storage is transient, if the cluster terminates, the table data is lost, and the table must be recreated) Using an External MySQL Database or Amazon Aurora https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-external.html (Your Hive cluster runs using the metastore located in Amazon RDS. Launch all additional Hive clusters that share this metastore by specifying the metastore location.)
upvoted 1 times
sam3787
3 years, 7 months ago
thanks. the metadata for S3 external tables is still defined on long running clusters. So shouldn't that point to using Glue? (D as option)
upvoted 1 times
...
...
...
cybe001
3 years, 7 months ago
D is least operational effort
upvoted 1 times
...
bigdatalearner
3 years, 7 months ago
B and D both are correct but looks like Glue will be easily configurable so it would be D
upvoted 1 times
...
mattyb123
3 years, 8 months ago
Isn't it D. LEAST operational effort.
upvoted 1 times
mattyb123
3 years, 7 months ago
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html. Using Amazon EMR version 5.8.0 or later, you can configure Hive to use the AWS Glue Data Catalog as its metastore. We recommend this configuration when you require a persistent metastore or a metastore shared by different clusters, services, applications, or AWS accounts.
upvoted 2 times
mattyb123
3 years, 7 months ago
Only reason it could be B. Is due to glue needing to be setup as the metadata store before launching the EMR cluster. Anyone else have some thoughts on this?
upvoted 1 times
mattyb123
3 years, 7 months ago
Thoughts anyone?
upvoted 1 times
jlpl
3 years, 7 months ago
Make sense to seleted "D" but again, I have not try 'handon' create a AWS Glue yet.
upvoted 1 times
mattyb123
3 years, 7 months ago
Looks like D is correct. Storing external data on S3. https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html. When you define an Amazon S3 data store to crawl, you can choose whether to crawl a path in your account or another account. The output of the crawler is one or more metadata tables defined in the AWS Glue Data Catalog. A table is created for one or more files found in your data store. If all the Amazon S3 files in a folder have the same schema, the crawler creates one table. Also, if the Amazon S3 object is partitioned, only one metadata table is created.
upvoted 1 times
...
...
...
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...