exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 140 discussion

A company receives test results from testing facilities that are located around the world. The company stores the test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data engineer needs to process the files, convert them into Apache Parquet format, and load them into Amazon Redshift tables. The data engineer uses AWS Glue to process the files, AWS Step Functions to orchestrate the processes, and Amazon EventBridge to schedule jobs.

The company recently added more testing facilities. The time required to process files is increasing. The data engineer must reduce the data processing time.

Which solution will MOST reduce the data processing time?

  • A. Use AWS Lambda to group the raw input files into larger files. Write the larger files back to Amazon S3. Use AWS Glue to process the files. Load the files into the Amazon Redshift tables.
  • B. Use the AWS Glue dynamic frame file-grouping option to ingest the raw input files. Process the files. Load the files into the Amazon Redshift tables.
  • C. Use the Amazon Redshift COPY command to move the raw input files from Amazon S3 directly into the Amazon Redshift tables. Process the files in Amazon Redshift.
  • D. Use Amazon EMR instead of AWS Glue to group the raw input files. Process the files in Amazon EMR. Load the files into the Amazon Redshift tables.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Mitchdu
5 days, 19 hours ago
Selected Answer: A
Option A: Lambda to group files → AWS Glue - Pre-processing step: Lambda combines small files into larger ones - Benefits: Reduces number of files Glue needs to process - Efficiency: Fewer S3 API calls, better parallelization in Glue - Result: Significant reduction in processing overhead Option B: AWS Glue dynamic frame file-grouping - Built-in feature: Glue can group small files during processing - Benefits: Reduces overhead within Glue job execution - Limitation: Still needs to read all individual files initially - Result: Some improvement but less than pre-grouping
upvoted 1 times
...
bac9792
1 month ago
Selected Answer: A
While AWS Glue's groupFiles parameter can help, it doesn't eliminate the overhead of reading numerous small files. Preprocessing files into larger ones before they reach AWS Glue is more effective.
upvoted 1 times
...
minhhnh
5 months, 2 weeks ago
Selected Answer: B
The key requirement is to reduce processing time for millions of small JSON files stored in Amazon S3. The solution needs to address the inefficiencies caused by the large number of small files while leveraging the existing AWS Glue and Amazon Redshift setup.
upvoted 2 times
...
aragon_saa
10 months, 1 week ago
Selected Answer: B
Answer is B
upvoted 1 times
...
matt200
10 months, 1 week ago
Selected Answer: B
Option B: Use the AWS Glue dynamic frame file-grouping option to ingest the raw input files. Process the files. Load the files into the Amazon Redshift tables.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...