Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 66 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 66
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A healthcare company uses AWS data and analytics tools to collect, ingest, and store electronic health record (EHR) data about its patients. The raw EHR data is stored in Amazon S3 in JSON format partitioned by hour, day, and year and is updated every hour. The company wants to maintain the data catalog and metadata in an AWS Glue Data Catalog to be able to access the data using Amazon Athena or Amazon Redshift Spectrum for analytics.
When defining tables in the Data Catalog, the company has the following requirements:
✑ Choose the catalog table name and do not rely on the catalog table naming algorithm.
✑ Keep the table updated with new partitions loaded in the respective S3 bucket prefixes.
Which solution meets these requirements with minimal effort?

A. Run an AWS Glue crawler that connects to one or more data stores, determines the data structures, and writes tables in the Data Catalog.
B. Use the AWS Glue console to manually create a table in the Data Catalog and schedule an AWS Lambda function to update the table partitions hourly.
C. Use the AWS Glue API CreateTable operation to create a table in the Data Catalog. Create an AWS Glue crawler and specify the table as the source.
D. Create an Apache Hive catalog in Amazon EMR with the table schema definition in Amazon S3, and update the table partition with a scheduled job. Migrate the Hive catalog to the Data Catalog.

Show Suggested Answer

Suggested Answer: C 🗳️

by testtaker3434 at Aug. 28, 2020, 1:03 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Marc34

Highly Voted 3 years, 9 months ago

C. https://docs.aws.amazon.com/glue/latest/dg/tables-described.html In this section : Updating Manually Created Data Catalog Tables Using Crawlers "The following are other reasons why you might want to manually create catalog tables and specify catalog tables as the crawler source: You want to choose the catalog table name and not rely on the catalog table naming algorithm. "

upvoted 31 times

Phoenyx89

3 years, 9 months ago

I agree is C. B is wrong because it takes more effort than B using Lambda, instead the Glue Crawler is full automated to update and find new partitions

upvoted 1 times

...

awssp12345

3 years, 9 months ago

I change my answer to C

upvoted 1 times

...

rsn

2 years, 4 months ago

However there is no info on scheduling the crawler in C. any thoughts?

upvoted 1 times

...

NarenKA

Most Recent 1 year, 4 months ago

Selected Answer: A

I think Option A is correct. AWS Glue crawlers automatically scan data in S3, recognize the format and schema, and create metadata tables in the AWS Glue Data Catalog. This eliminates manual schema definition, table creation and meets new partitions with minimal effort. We can specify the database in which the tables are created and control the naming of the tables through the crawler's configuration settings rather than relying on an automated naming algorithm. As new data is added to S3 in hourly, daily, and yearly partitions, running the crawler at regular intervals ensures that new partitions are discovered and added to the respective Data Catalog tables automatically. Option C uses the AWS Glue API to create a table, which is more manual than allowing a crawler to discover and manage tables and does not automatically address the ongoing discovery of new partitions.

upvoted 1 times

...

pk349

2 years, 2 months ago

C: I passed the test

upvoted 1 times

...

cloudlearnerhere

2 years, 8 months ago

Selected Answer: C

Correct answer is C as the AWS Glue API CreateTable operation can be used to create a table in the Data Catalog and the AWS Glue crawler can be created and the table as the source can be specified. This meets the requirement of catalog table names and keeping the table updated with new data. B is incorrect. Although creating a new catalog table is right, the use of a Lambda function to update the table partitions entails a lot of development work. Remember that it is explicitly stated in the scenario that the solution should be implemented with the least configuration overhead. A is incorrect because you must create a new catalog table if you do not want to rely on the catalog table naming algorithm provided by AWS Glue. D is incorrect because this solution entails a lot of effort. A better and easier solution is to just create a new table in AWS Glue Data Catalog and set up an AWS Glue crawler.

upvoted 3 times

...

bp339

2 years, 8 months ago

Selected Answer: C

The following are other reasons why you might want to manually create catalog tables and specify catalog tables as the crawler source: You want to choose the catalog table name and not rely on the catalog table naming algorithm. You want to prevent new tables from being created in the case where files with a format that could disrupt partition detection are mistakenly saved in the data source path.

upvoted 2 times

...

rocky48

2 years, 11 months ago

Selected Answer: C

Answer is C

upvoted 1 times

...

Bik000

3 years, 1 month ago

Selected Answer: C

My Answer is C

upvoted 1 times

...

rb39

3 years, 3 months ago

Selected Answer: C

You need to use the API to be able to provide a custom name

upvoted 1 times

...

lakediver

3 years, 6 months ago

Selected Answer: C

https://docs.aws.amazon.com/glue/latest/dg/tables-described.html#update-manual-tables

upvoted 4 times

...

aws2019

3 years, 7 months ago

Answer should be C.

upvoted 1 times

...

sayed

3 years, 8 months ago

C as per the linked provided not B read "Updating Manually Created Data Catalog Tables Using Crawlers" https://docs.aws.amazon.com/glue/latest/dg/tables-described.html

upvoted 1 times

...

lostsoul07

3 years, 8 months ago

C is the right answer

upvoted 2 times

...

tleflond

3 years, 8 months ago

I think there might be a miss typo in C, the table needs to be defined as target and not source

upvoted 1 times

zevzek

3 years, 8 months ago

It also mentions this The following are other reasons why you might want to manually create catalog tables and specify catalog tables as the crawler source: -You want to choose the catalog table name and not rely on the catalog table naming algorithm

upvoted 1 times

...

zevzek

3 years, 8 months ago

No typo I think.. https://docs.aws.amazon.com/glue/latest/dg/tables-described.html Updating Manually Created Data Catalog Tables Using Crawlers: To do this, when you define a crawler, instead of specifying one or more data stores as the source of a crawl, you specify one or more existing Data Catalog tables. The crawler then crawls the data stores specified by the catalog tables. In this case, no new tables are created; instead, your manually created tables are updated.

upvoted 2 times

...