Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 140 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 140
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A company receives test results from testing facilities that are located around the world. The company stores the test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data engineer needs to process the files, convert them into Apache Parquet format, and load them into Amazon Redshift tables. The data engineer uses AWS Glue to process the files, AWS Step Functions to orchestrate the processes, and Amazon EventBridge to schedule jobs.

The company recently added more testing facilities. The time required to process files is increasing. The data engineer must reduce the data processing time.

Which solution will MOST reduce the data processing time?

A. Use AWS Lambda to group the raw input files into larger files. Write the larger files back to Amazon S3. Use AWS Glue to process the files. Load the files into the Amazon Redshift tables.
B. Use the AWS Glue dynamic frame file-grouping option to ingest the raw input files. Process the files. Load the files into the Amazon Redshift tables.
C. Use the Amazon Redshift COPY command to move the raw input files from Amazon S3 directly into the Amazon Redshift tables. Process the files in Amazon Redshift.
D. Use Amazon EMR instead of AWS Glue to group the raw input files. Process the files in Amazon EMR. Load the files into the Amazon Redshift tables.

Show Suggested Answer

Suggested Answer: B 🗳️

by matt200 at Aug. 14, 2024, 2:09 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Mitchdu

5 days, 19 hours ago

Selected Answer: A

Option A: Lambda to group files → AWS Glue - Pre-processing step: Lambda combines small files into larger ones - Benefits: Reduces number of files Glue needs to process - Efficiency: Fewer S3 API calls, better parallelization in Glue - Result: Significant reduction in processing overhead Option B: AWS Glue dynamic frame file-grouping - Built-in feature: Glue can group small files during processing - Benefits: Reduces overhead within Glue job execution - Limitation: Still needs to read all individual files initially - Result: Some improvement but less than pre-grouping

upvoted 1 times

...

bac9792

1 month ago

Selected Answer: A

While AWS Glue's groupFiles parameter can help, it doesn't eliminate the overhead of reading numerous small files. Preprocessing files into larger ones before they reach AWS Glue is more effective.

upvoted 1 times

...

minhhnh

5 months, 2 weeks ago

Selected Answer: B

The key requirement is to reduce processing time for millions of small JSON files stored in Amazon S3. The solution needs to address the inefficiencies caused by the large number of small files while leveraging the existing AWS Glue and Amazon Redshift setup.

upvoted 2 times

...

aragon_saa

10 months, 1 week ago

Selected Answer: B

Answer is B

upvoted 1 times

...

matt200

10 months, 1 week ago

Selected Answer: B

Option B: Use the AWS Glue dynamic frame file-grouping option to ingest the raw input files. Process the files. Load the files into the Amazon Redshift tables.

upvoted 1 times

...