exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 73 discussion

A company receives a daily file that contains customer data in .xls format. The company stores the file in Amazon S3. The daily file is approximately 2 GB in size.
A data engineer concatenates the column in the file that contains customer first names and the column that contains customer last names. The data engineer needs to determine the number of distinct customers in the file.
Which solution will meet this requirement with the LEAST operational effort?

  • A. Create and run an Apache Spark job in an AWS Glue notebook. Configure the job to read the S3 file and calculate the number of distinct customers.
  • B. Create an AWS Glue crawler to create an AWS Glue Data Catalog of the S3 file. Run SQL queries from Amazon Athena to calculate the number of distinct customers.
  • C. Create and run an Apache Spark job in Amazon EMR Serverless to calculate the number of distinct customers.
  • D. Use AWS Glue DataBrew to create a recipe that uses the COUNT_DISTINCT aggregate function to calculate the number of distinct customers.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rralucard_
Highly Voted 1 year, 4 months ago
Selected Answer: D
AWS Glue DataBrew: AWS Glue DataBrew is a visual data preparation tool that allows data engineers and data analysts to clean and normalize data without writing code. Using DataBrew, a data engineer could create a recipe that includes the concatenation of the customer first and last names and then use the COUNT_DISTINCT function. This would not require complex code and could be performed through the DataBrew user interface, representing a lower operational effort.
upvoted 9 times
...
Juan_pc
Most Recent 1 month, 3 weeks ago
Selected Answer: A
According to the official DataBrew documentation, it does not natively support files in .xls format (it does support .xlsx). The correct option is A.
upvoted 1 times
...
pypelyncar
1 year ago
Selected Answer: D
DataBrew supports various transformations, including the COUNT_DISTINCT function, which is ideal for calculating the number of unique values in a column (combined first and last names in this case).
upvoted 2 times
...
Ousseyni
1 year, 2 months ago
Selected Answer: D
go in D
upvoted 2 times
...
lucas_rfsb
1 year, 2 months ago
Selected Answer: D
since it's less operational effort, I would go in D
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...