exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 54 discussion

A company is planning to migrate on-premises Apache Hadoop clusters to Amazon EMR. The company also needs to migrate a data catalog into a persistent storage solution.
The company currently stores the data catalog in an on-premises Apache Hive metastore on the Hadoop clusters. The company requires a serverless solution to migrate the data catalog.
Which solution will meet these requirements MOST cost-effectively?

  • A. Use AWS Database Migration Service (AWS DMS) to migrate the Hive metastore into Amazon S3. Configure AWS Glue Data Catalog to scan Amazon S3 to produce the data catalog.
  • B. Configure a Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use AWS Glue Data Catalog to store the company's data catalog as an external data catalog.
  • C. Configure an external Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use Amazon Aurora MySQL to store the company's data catalog.
  • D. Configure a new Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use the new metastore as the company's data catalog.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Asmunk
5 months, 3 weeks ago
Selected Answer: B
A and D can be discarded because of added steps. This link provides documentation for this exact use case : https://aws.amazon.com/blogs/big-data/migrate-and-deploy-your-apache-hive-metastore-on-amazon-emr/ C is also discarded because of the serverless key word, although Aurora can be serverless it is not specified in the choice.
upvoted 1 times
...
Christina666
1 year ago
Selected Answer: B
Serverless and Cost-Efficient: AWS Glue Data Catalog offers a serverless metadata repository, reducing operational overhead and making it cost-effective. Using it as an external data catalog means you don't have to manage additional database infrastructure. Seamless Migration: Migrating your existing Hive metastore to Amazon EMR ensures compatibility with your current Hadoop setup. EMR is designed to run Hadoop workloads, facilitating this process. Flexibility: An external data catalog in AWS Glue offers flexibility and separation of concerns. Your metastore remains managed by EMR for your Hadoop workloads, while Glue provides a centralized catalog for broader AWS data sources.
upvoted 2 times
...
nyaopoko
1 year ago
B is answer! By leveraging AWS Glue Data Catalog as an external data catalog and migrating the existing Hive metastore into Amazon EMR, the company can achieve a serverless, persistent, and cost-effective solution for storing and managing their data catalog.
upvoted 1 times
...
arvehisa
1 year ago
Selected Answer: B
B. https://aws.amazon.com/jp/blogs/big-data/migrate-and-deploy-your-apache-hive-metastore-on-amazon-emr/
upvoted 2 times
...
lucas_rfsb
1 year, 1 month ago
Selected Answer: A
I will go with A. Besides DMS is typical for migration, it's the only choice which explicitly concerns about how the migration itself will be made. Other choices would demand a script or GLUE ETL job if you will. But this logic of migration was never put
upvoted 2 times
...
LeoSantos121212121212121
1 year, 1 month ago
I will go with A
upvoted 2 times
...
jpmadan
1 year, 1 month ago
Selected Answer: B
serverless catalog in AWS == glue
upvoted 1 times
...
damaldon
1 year, 1 month ago
B. Set up an AWS Glue ETL job which extracts metadata from your Hive metastore (MySQL) and loads it into your AWS Glue Data Catalog. This method requires an AWS Glue connection to the Hive metastore as a JDBC source. An ETL script is provided to extract metadata from the Hive metastore and write it to AWS Glue Data Catalog. https://github.com/aws-samples/aws-glue-samples/blob/master/utilities/Hive_metastore_migration/README.md
upvoted 1 times
...
rralucard_
1 year, 3 months ago
Selected Answer: B
https://aws.amazon.com/blogs/big-data/migrate-and-deploy-your-apache-hive-metastore-on-amazon-emr/ Option B is likely the most suitable. Migrating the Hive metastore into Amazon EMR and using AWS Glue Data Catalog as an external catalog provides a balance between leveraging the scalable and managed services of AWS (like EMR and Glue Data Catalog) and ensuring a smooth transition from the on-premises setup. This approach leverages the serverless nature of AWS Glue Data Catalog, minimizing operational overhead and potentially reducing costs compared to managing database servers.
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago