exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 17 discussion

A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour.
Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)

  • A. Configure AWS Glue triggers to run the ETL jobs every hour.
  • B. Use AWS Glue DataBrew to clean and prepare the data for analytics.
  • C. Use AWS Lambda functions to schedule and run the ETL jobs every hour.
  • D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.
  • E. Use the Redshift Data API to load transformed data into Amazon Redshift.
Show Suggested Answer Hide Answer
Suggested Answer: AD 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rralucard_
Highly Voted 1 year, 3 months ago
Selected Answer: AD
AWS Glue triggers provide a simple and integrated way to schedule ETL jobs. By configuring these triggers to run hourly, the data engineer can ensure that the data processing and updates occur as required without the need for external scheduling tools or custom scripts. This approach is directly integrated with AWS Glue, reducing the complexity and operational overhead. AWS Glue supports connections to various data sources, including Amazon RDS and MongoDB. By using AWS Glue connections, the data engineer can easily configure and manage the connectivity between these data sources and Amazon Redshift. This method leverages AWS Glue’s built-in capabilities for data source integration, thus minimizing operational complexity and ensuring a seamless data flow from the sources to the destination (Amazon Redshift).
upvoted 7 times
...
pypelyncar
Highly Voted 10 months, 3 weeks ago
Selected Answer: AD
A. Configure AWS Glue triggers to run the ETL jobs every hour. Reduced Code Complexity: Glue triggers eliminate the need to write custom code for scheduling ETL jobs. This simplifies the pipeline and reduces maintenance overhead. Scalability and Integration: Glue triggers work seamlessly with Glue ETL jobs, ensuring efficient scheduling and execution within the Glue ecosystem. D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift. Pre-Built Connectors: Glue connections offer pre-built connectors for various data sources like RDS and Redshift. This eliminates the need for manual configuration and simplifies data source access within the ETL jobs. Centralized Management: Glue connections are centrally managed within the Glue service, streamlining connection management and reducing operational overhead.
upvoted 6 times
...
saransh_001
Most Recent 2 months, 2 weeks ago
Selected Answer: AD
A. AWS Glue provides a built-in mechanism to trigger ETL jobs at scheduled intervals, such as every hour. Using Glue triggers minimizes the need for additional custom code or services, reducing operational overhead. D. AWS Glue connections simplify the process of establishing secure and reliable connections to various data sources (Amazon RDS, MongoDB) and the destination (Amazon Redshift). This approach reduces the need for manually configuring connection settings and makes the ETL pipeline easier to maintain.
upvoted 2 times
...
San_Juan
7 months, 1 week ago
Selected Answer: AC A. because the question is saying that the jobs are build in Glue, and must run every hour. C. because you can run the jobs as Lambda functions every hour. B. discarted, because the question is saying that "DE" is using Glue, DataBrew is for cleaning data without code, but it seems that the "DE" is writing code for transforming the data. D. Discarted, because the connections are not directly related to the question, that it is saying that you should run every hour Glue jobs, and the connections doesn't seem relevant. E. Discarted, because is saying that the data source is RDS and MongoDB, not Redshift, so you cannot use the Redshift Data API for getting the data and transform it.
upvoted 1 times
...
sachin
8 months, 3 weeks ago
AE D is not valid. as it shoyld be Use AWS Glue connections to establish connectivity between the data sources (including Amazon Redshift) and Glue Job
upvoted 1 times
samadal
8 months, 2 weeks ago
An AWS Glue connection is a setting that allows an AWS Glue job to access a data source. This allows you to connect to databases such as RDS, MongoDB, etc. However, this opinion states that this connection is not used to load data directly into Redshift, and that Glue jobs must use the COPY command to load data into Redshift, which is inappropriate. However, since Glue jobs can process data and load it directly into Redshift, it is a bit of a stretch to consider option D as unconditionally wrong.
upvoted 1 times
...
...
DevoteamAnalytix
12 months ago
Selected Answer: AD
I was not sure about A - But in AWS console => Glue => Triggers => Add Trigger I have found the Trigger type: "Schedule - Fire the trigger on a timer."
upvoted 3 times
...
lucas_rfsb
1 year, 1 month ago
Selected Answer: CD
I found this question actually confusing. In which step the transformation would be implemented itself? I can be wrong, but with Glue triggers we would only run the job, but not the transformation logic itself. In this way, I would go in C and D
upvoted 1 times
...
milofficial
1 year, 1 month ago
Selected Answer: AD
Not a clear question - B would kinda make sense - but AD seems to be more correct
upvoted 3 times
...
GiorgioGss
1 year, 1 month ago
Selected Answer: AD
A - this is obvious and D -https://docs.aws.amazon.com/glue/latest/dg/console-connections.html
upvoted 4 times
...
TonyStark0122
1 year, 3 months ago
A. Configure AWS Glue triggers to run the ETL jobs every hour. D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift. Explanation: Option A: Configuring AWS Glue triggers allows the ETL jobs to be scheduled and run automatically every hour without the need for manual intervention. This reduces operational overhead by automating the data processing pipeline. Option D: Using AWS Glue connections simplifies connectivity between the data sources (Amazon RDS and MongoDB) and Amazon Redshift. Glue connections abstract away the details of connection configuration, making it easier to manage and maintain the data pipeline.
upvoted 3 times
...
milofficial
1 year, 3 months ago
Selected Answer: AB
Lambda triggers for Glue jobs make me dizzy
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago