Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 238 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 238
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A company uses Amazon S3 and AWS Glue Data Catalog to manage a data lake that contains contact information for customers. The company uses PySpark and AWS Glue jobs with a DynamicFrame to run a workflow that processes data within the data lake.

A data engineer notices that the workflow is generating errors as a result of how customer postal codes are stored in the data lake. Some postal codes include unnecessary numbers or invalid characters.

The data engineer needs a solution to address the errors and correct the postal codes in the data lake.

A. Create a schema definition for PySpark that matches the format the processing workflow requires for postal codes. Pass the schema to the DynamicFrame during processing.
B. Use AWS Glue workflow properties to allow job state sharing. Configure the AWS Glue jobs to read values from the postal code column by using the properties from a previously successful run of the jobs.
C. Configure the column.push_down_predicate setting and the catalogPartitionPredicate settings for the postal code column in the DynamicFrame.
D. Set the DynamicFrame additional_options parameter ‘useS3ListImplementation’ to True.

Show Suggested Answer

Suggested Answer: A 🗳️

by rdiaz at July 4, 2025, 7:03 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rdiaz

4 weeks, 1 day ago

Selected Answer: A

The core issue is inconsistent or invalid postal code formats in the data, which causes errors during processing. • Option A addresses this by enforcing a defined schema — this ensures postal codes are interpreted correctly (e.g., as strings with a certain format). • When you pass a schema to a DynamicFrame, PySpark will cast and clean the data according to the schema. This helps to: • Filter out or transform invalid entries. • Standardize the data format before further processing. • This is a standard and effective approach to deal with data format inconsistencies in ETL workflows.

upvoted 1 times

...