Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 17 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 17
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour.
Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)

A. Configure AWS Glue triggers to run the ETL jobs every hour.
B. Use AWS Glue DataBrew to clean and prepare the data for analytics.
C. Use AWS Lambda functions to schedule and run the ETL jobs every hour.
D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.
E. Use the Redshift Data API to load transformed data into Amazon Redshift.

Show Suggested Answer

Suggested Answer: AD 🗳️

by milofficial at Jan. 20, 2024, 11:59 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rralucard_

Highly Voted 1 year, 7 months ago

Selected Answer: AD

AWS Glue triggers provide a simple and integrated way to schedule ETL jobs. By configuring these triggers to run hourly, the data engineer can ensure that the data processing and updates occur as required without the need for external scheduling tools or custom scripts. This approach is directly integrated with AWS Glue, reducing the complexity and operational overhead. AWS Glue supports connections to various data sources, including Amazon RDS and MongoDB. By using AWS Glue connections, the data engineer can easily configure and manage the connectivity between these data sources and Amazon Redshift. This method leverages AWS Glue’s built-in capabilities for data source integration, thus minimizing operational complexity and ensuring a seamless data flow from the sources to the destination (Amazon Redshift).

upvoted 7 times

...

pypelyncar

Highly Voted 1 year, 2 months ago

Selected Answer: AD

A. Configure AWS Glue triggers to run the ETL jobs every hour. Reduced Code Complexity: Glue triggers eliminate the need to write custom code for scheduling ETL jobs. This simplifies the pipeline and reduces maintenance overhead. Scalability and Integration: Glue triggers work seamlessly with Glue ETL jobs, ensuring efficient scheduling and execution within the Glue ecosystem. D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift. Pre-Built Connectors: Glue connections offer pre-built connectors for various data sources like RDS and Redshift. This eliminates the need for manual configuration and simplifies data source access within the ETL jobs. Centralized Management: Glue connections are centrally managed within the Glue service, streamlining connection management and reducing operational overhead.

upvoted 6 times

...

saransh_001

Most Recent 6 months, 1 week ago

Selected Answer: AD

A. AWS Glue provides a built-in mechanism to trigger ETL jobs at scheduled intervals, such as every hour. Using Glue triggers minimizes the need for additional custom code or services, reducing operational overhead. D. AWS Glue connections simplify the process of establishing secure and reliable connections to various data sources (Amazon RDS, MongoDB) and the destination (Amazon Redshift). This approach reduces the need for manually configuring connection settings and makes the ETL pipeline easier to maintain.

upvoted 2 times

...

San_Juan

11 months ago

Selected Answer: AC A. because the question is saying that the jobs are build in Glue, and must run every hour. C. because you can run the jobs as Lambda functions every hour. B. discarted, because the question is saying that "DE" is using Glue, DataBrew is for cleaning data without code, but it seems that the "DE" is writing code for transforming the data. D. Discarted, because the connections are not directly related to the question, that it is saying that you should run every hour Glue jobs, and the connections doesn't seem relevant. E. Discarted, because is saying that the data source is RDS and MongoDB, not Redshift, so you cannot use the Redshift Data API for getting the data and transform it.

upvoted 1 times

...

sachin

1 year ago

AE D is not valid. as it shoyld be Use AWS Glue connections to establish connectivity between the data sources (including Amazon Redshift) and Glue Job

upvoted 1 times

samadal

1 year ago

An AWS Glue connection is a setting that allows an AWS Glue job to access a data source. This allows you to connect to databases such as RDS, MongoDB, etc. However, this opinion states that this connection is not used to load data directly into Redshift, and that Glue jobs must use the COPY command to load data into Redshift, which is inappropriate. However, since Glue jobs can process data and load it directly into Redshift, it is a bit of a stretch to consider option D as unconditionally wrong.

upvoted 1 times

...

DevoteamAnalytix

1 year, 3 months ago

Selected Answer: AD

I was not sure about A - But in AWS console => Glue => Triggers => Add Trigger I have found the Trigger type: "Schedule - Fire the trigger on a timer."

upvoted 3 times

...

lucas_rfsb

1 year, 4 months ago

Selected Answer: CD

I found this question actually confusing. In which step the transformation would be implemented itself? I can be wrong, but with Glue triggers we would only run the job, but not the transformation logic itself. In this way, I would go in C and D

upvoted 1 times

...

milofficial

1 year, 5 months ago

Selected Answer: AD

Not a clear question - B would kinda make sense - but AD seems to be more correct

upvoted 3 times

...

GiorgioGss

1 year, 5 months ago

Selected Answer: AD

A - this is obvious and D -https://docs.aws.amazon.com/glue/latest/dg/console-connections.html

upvoted 4 times

...

TonyStark0122

1 year, 6 months ago

A. Configure AWS Glue triggers to run the ETL jobs every hour. D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift. Explanation: Option A: Configuring AWS Glue triggers allows the ETL jobs to be scheduled and run automatically every hour without the need for manual intervention. This reduces operational overhead by automating the data processing pipeline. Option D: Using AWS Glue connections simplifies connectivity between the data sources (Amazon RDS and MongoDB) and Amazon Redshift. Glue connections abstract away the details of connection configuration, making it easier to manage and maintain the data pipeline.

upvoted 3 times

...

milofficial

1 year, 7 months ago

Selected Answer: AB

Lambda triggers for Glue jobs make me dizzy

upvoted 2 times

...