Exam AWS Certified Big Data - Specialty All Questions

View all questions & answers for the AWS Certified Big Data - Specialty exam

Exam AWS Certified Big Data - Specialty topic 2 question 5 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty

Question #: 5
Topic #: 2

[All AWS Certified Big Data - Specialty Questions]

An organization is currently using an Amazon EMR long-running cluster with the latest Amazon EMR release for analytic jobs and is storing data as external tables on Amazon S3.
The company needs to launch multiple transient EMR clusters to access the same tables concurrently, but the metadata about the Amazon S3 external tables are defined and stored on the long-running cluster.
Which solution will expose the Hive metastore with the LEAST operational effort?

A. Export Hive metastore information to Amazon DynamoDB hive-site classification to point to the Amazon DynamoDB table.
B. Export Hive metastore information to a MySQL table on Amazon RDS and configure the Amazon EMR hive-site classification to point to the Amazon RDS database.
C. Launch an Amazon EC2 instance, install and configure Apache Derby, and export the Hive metastore information to derby.
D. Create and configure an AWS Glue Data Catalog as a Hive metastore for Amazon EMR.

Show Suggested Answer

Suggested Answer: B 🗳️

by mattyb123 at Aug. 26, 2019, 5:11 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Kuang

Highly Voted 3 years, 6 months ago

It is D. The key is "multiple transient EMR clusters to access the same tables concurrently". For External RDS metastore, it is not recommended to write concurrently.(https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-external.html) For Glue, it have 1 to 10 concurrent access.(https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html)

upvoted 8 times

...

ariane_tateishi

Most Recent 3 years, 6 months ago

D. Should be the right answer, because Amazon recommends use this configuration when you require a persistent metastore or a metastore shared by different clusters, services, applications, or AWS account. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html

upvoted 1 times

...

srirampc

3 years, 6 months ago

answer is D. Glue crawlers can update tables when schema changes, thus requiring the LEAST operational effort.

upvoted 1 times

...

YashBindlish

3 years, 6 months ago

Answer is D https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html

upvoted 2 times

...

san2020

3 years, 7 months ago

my selection B

upvoted 2 times

...

richardxyz

3 years, 7 months ago

D is correct; the question is about "which solution will expose the Hive metastore, ... defined on the long-running cluster"

upvoted 1 times

...

yuriy_ber

3 years, 7 months ago

well long-running cluster is already operating Hive catalog on in MySQL DB on Master Node so I think moving existing database to RDS and switching is more easy than using Glue Crawler, so B

upvoted 4 times

ME2000

3 years, 7 months ago

Here we go... Configuring an External Metastore for Hive https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-metastore-external-hive.html Using the AWS Glue Data Catalog as the Metastore for Hive https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html (Important point: If another cluster needs to access the table, it fails unless it has adequate permissions to the cluster that created the table. Furthermore, because HDFS storage is transient, if the cluster terminates, the table data is lost, and the table must be recreated) Using an External MySQL Database or Amazon Aurora https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-external.html (Your Hive cluster runs using the metastore located in Amazon RDS. Launch all additional Hive clusters that share this metastore by specifying the metastore location.)

upvoted 1 times

sam3787

3 years, 7 months ago

thanks. the metadata for S3 external tables is still defined on long running clusters. So shouldn't that point to using Glue? (D as option)

upvoted 1 times

...

cybe001

3 years, 7 months ago

D is least operational effort

upvoted 1 times

...

bigdatalearner

3 years, 7 months ago

B and D both are correct but looks like Glue will be easily configurable so it would be D

upvoted 1 times

...

mattyb123

3 years, 8 months ago

Isn't it D. LEAST operational effort.

upvoted 1 times

mattyb123

3 years, 7 months ago

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html. Using Amazon EMR version 5.8.0 or later, you can configure Hive to use the AWS Glue Data Catalog as its metastore. We recommend this configuration when you require a persistent metastore or a metastore shared by different clusters, services, applications, or AWS accounts.

upvoted 2 times

mattyb123

3 years, 7 months ago

Only reason it could be B. Is due to glue needing to be setup as the metadata store before launching the EMR cluster. Anyone else have some thoughts on this?

upvoted 1 times

mattyb123

3 years, 7 months ago

Thoughts anyone?

upvoted 1 times

jlpl

3 years, 7 months ago

Make sense to seleted "D" but again, I have not try 'handon' create a AWS Glue yet.

upvoted 1 times

mattyb123

3 years, 7 months ago

Looks like D is correct. Storing external data on S3. https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html. When you define an Amazon S3 data store to crawl, you can choose whether to crawl a path in your account or another account. The output of the crawler is one or more metadata tables defined in the AWS Glue Data Catalog. A table is created for one or more files found in your data store. If all the Amazon S3 files in a folder have the same schema, the crawler creates one table. Also, if the Amazon S3 object is partitioned, only one metadata table is created.

upvoted 1 times

...

Load full discussion...

...