Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 230 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 230
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB. The company wants to build a sales prediction model by using data from the previous 5 years. The historic data includes 44,000 files.

The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The company notices significant performance issues with the second ETL pipeline.

The company needs to improve the performance of the second pipeline.

Which solution will meet this requirement MOST cost-effectively?

A. Use a larger worker type.
B. Increase the number of workers in the AWS Glue ETL jobs.
C. Use the AWS Glue DynamicFrame grouping option.
D. Enable AWS Glue auto scaling.

Show Suggested Answer

Suggested Answer: C 🗳️

by rdiaz at July 4, 2025, 4:52 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rdiaz

1 month, 2 weeks ago

Selected Answer: C

AWS Glue DynamicFrame grouping allows you to group multiple small files into larger partitions in-memory before processing. • When processing tens of thousands of small files (as in this case with 44,000 files), grouping improves performance dramatically by reducing I/O overhead and optimizing Spark shuffle operations. • This solution does not require increasing costs (no larger worker types or scaling), so it is the most cost-effective approach.

upvoted 1 times

...