Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 92 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 92
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A company has developed several AWS Glue extract, transform, and load (ETL) jobs to validate and transform data from Amazon S3. The ETL jobs load the data into Amazon RDS for MySQL in batches once every day. The ETL jobs use a DynamicFrame to read the S3 data.

The ETL jobs currently process all the data that is in the S3 bucket. However, the company wants the jobs to process only the daily incremental data.

Which solution will meet this requirement with the LEAST coding effort?

A. Create an ETL job that reads the S3 file status and logs the status in Amazon DynamoDB.
B. Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data.
C. Enable job metrics for the ETL jobs to help keep track of processed objects in Amazon CloudWatch.
D. Configure the ETL jobs to delete processed objects from Amazon S3 after each run.

Show Suggested Answer

Suggested Answer: B 🗳️

by tgv at June 15, 2024, 9:54 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

tgv

Highly Voted 1 year ago

Selected Answer: B

AWS Glue job bookmarks are designed to handle incremental data processing by automatically tracking the state.

upvoted 8 times

...

andrologin

Most Recent 11 months, 2 weeks ago

Selected Answer: B

AWS Glue Bookmarks can be used to pin where the data processing last stopped hence help with incremental processing.

upvoted 1 times

...

HunkyBunky

12 months ago

Selected Answer: B

B - bookmarks is a key

upvoted 1 times

...

bakarys

12 months ago

Selected Answer: B

The solution that will meet this requirement with the least coding effort is Option B: Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data. AWS Glue job bookmarks help ETL jobs to keep track of data that has already been processed during previous runs. By enabling job bookmarks, the ETL jobs can skip the processed data and only process the new, incremental data. This feature is designed specifically for this use case and requires minimal coding effort. Options A, C, and D would require additional coding and operational effort. Option A would require creating a new ETL job and managing a DynamoDB table. Option C would involve setting up job metrics and CloudWatch, which doesn’t directly address processing incremental data. Option D would involve deleting data from S3 after processing, which might not be desirable if the original data needs to be retained. Therefore, Option B is the most suitable solution.

upvoted 3 times

...