exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 66 discussion

A healthcare company uses AWS data and analytics tools to collect, ingest, and store electronic health record (EHR) data about its patients. The raw EHR data is stored in Amazon S3 in JSON format partitioned by hour, day, and year and is updated every hour. The company wants to maintain the data catalog and metadata in an AWS Glue Data Catalog to be able to access the data using Amazon Athena or Amazon Redshift Spectrum for analytics.
When defining tables in the Data Catalog, the company has the following requirements:
✑ Choose the catalog table name and do not rely on the catalog table naming algorithm.
✑ Keep the table updated with new partitions loaded in the respective S3 bucket prefixes.
Which solution meets these requirements with minimal effort?

  • A. Run an AWS Glue crawler that connects to one or more data stores, determines the data structures, and writes tables in the Data Catalog.
  • B. Use the AWS Glue console to manually create a table in the Data Catalog and schedule an AWS Lambda function to update the table partitions hourly.
  • C. Use the AWS Glue API CreateTable operation to create a table in the Data Catalog. Create an AWS Glue crawler and specify the table as the source.
  • D. Create an Apache Hive catalog in Amazon EMR with the table schema definition in Amazon S3, and update the table partition with a scheduled job. Migrate the Hive catalog to the Data Catalog.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Marc34
Highly Voted 3 years, 9 months ago
C. https://docs.aws.amazon.com/glue/latest/dg/tables-described.html In this section : Updating Manually Created Data Catalog Tables Using Crawlers "The following are other reasons why you might want to manually create catalog tables and specify catalog tables as the crawler source: You want to choose the catalog table name and not rely on the catalog table naming algorithm. "
upvoted 31 times
Phoenyx89
3 years, 9 months ago
I agree is C. B is wrong because it takes more effort than B using Lambda, instead the Glue Crawler is full automated to update and find new partitions
upvoted 1 times
...
awssp12345
3 years, 9 months ago
I change my answer to C
upvoted 1 times
...
rsn
2 years, 4 months ago
However there is no info on scheduling the crawler in C. any thoughts?
upvoted 1 times
...
...
NarenKA
Most Recent 1 year, 4 months ago
Selected Answer: A
I think Option A is correct. AWS Glue crawlers automatically scan data in S3, recognize the format and schema, and create metadata tables in the AWS Glue Data Catalog. This eliminates manual schema definition, table creation and meets new partitions with minimal effort. We can specify the database in which the tables are created and control the naming of the tables through the crawler's configuration settings rather than relying on an automated naming algorithm. As new data is added to S3 in hourly, daily, and yearly partitions, running the crawler at regular intervals ensures that new partitions are discovered and added to the respective Data Catalog tables automatically. Option C uses the AWS Glue API to create a table, which is more manual than allowing a crawler to discover and manage tables and does not automatically address the ongoing discovery of new partitions.
upvoted 1 times
...
pk349
2 years, 2 months ago
C: I passed the test
upvoted 1 times
...
cloudlearnerhere
2 years, 8 months ago
Selected Answer: C
Correct answer is C as the AWS Glue API CreateTable operation can be used to create a table in the Data Catalog and the AWS Glue crawler can be created and the table as the source can be specified. This meets the requirement of catalog table names and keeping the table updated with new data. B is incorrect. Although creating a new catalog table is right, the use of a Lambda function to update the table partitions entails a lot of development work. Remember that it is explicitly stated in the scenario that the solution should be implemented with the least configuration overhead. A is incorrect because you must create a new catalog table if you do not want to rely on the catalog table naming algorithm provided by AWS Glue. D is incorrect because this solution entails a lot of effort. A better and easier solution is to just create a new table in AWS Glue Data Catalog and set up an AWS Glue crawler.
upvoted 3 times
...
bp339
2 years, 8 months ago
Selected Answer: C
The following are other reasons why you might want to manually create catalog tables and specify catalog tables as the crawler source: You want to choose the catalog table name and not rely on the catalog table naming algorithm. You want to prevent new tables from being created in the case where files with a format that could disrupt partition detection are mistakenly saved in the data source path.
upvoted 2 times
...
rocky48
2 years, 11 months ago
Selected Answer: C
Answer is C
upvoted 1 times
...
Bik000
3 years, 1 month ago
Selected Answer: C
My Answer is C
upvoted 1 times
...
rb39
3 years, 3 months ago
Selected Answer: C
You need to use the API to be able to provide a custom name
upvoted 1 times
...
lakediver
3 years, 6 months ago
Selected Answer: C
https://docs.aws.amazon.com/glue/latest/dg/tables-described.html#update-manual-tables
upvoted 4 times
...
aws2019
3 years, 7 months ago
Answer should be C.
upvoted 1 times
...
sayed
3 years, 8 months ago
C as per the linked provided not B read "Updating Manually Created Data Catalog Tables Using Crawlers" https://docs.aws.amazon.com/glue/latest/dg/tables-described.html
upvoted 1 times
...
lostsoul07
3 years, 8 months ago
C is the right answer
upvoted 2 times
...
tleflond
3 years, 8 months ago
I think there might be a miss typo in C, the table needs to be defined as target and not source
upvoted 1 times
zevzek
3 years, 8 months ago
It also mentions this The following are other reasons why you might want to manually create catalog tables and specify catalog tables as the crawler source: -You want to choose the catalog table name and not rely on the catalog table naming algorithm
upvoted 1 times
...
zevzek
3 years, 8 months ago
No typo I think.. https://docs.aws.amazon.com/glue/latest/dg/tables-described.html Updating Manually Created Data Catalog Tables Using Crawlers: To do this, when you define a crawler, instead of specifying one or more data stores as the source of a crawl, you specify one or more existing Data Catalog tables. The crawler then crawls the data stores specified by the catalog tables. In this case, no new tables are created; instead, your manually created tables are updated.
upvoted 2 times
...
...
LMax
3 years, 8 months ago
C for me.
upvoted 2 times
...
sanjaym
3 years, 8 months ago
Answer should be C.
upvoted 2 times
...
syu31svc
3 years, 8 months ago
Answer is C
upvoted 2 times
...
Paitan
3 years, 8 months ago
Answer is C.
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...