Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 25 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 25
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A data engineer needs to join data from multiple sources to perform a one-time analysis job. The data is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3.
Which solution will meet this requirement MOST cost-effectively?

A. Use an Amazon EMR provisioned cluster to read from all sources. Use Apache Spark to join the data and perform the analysis.
B. Copy the data from DynamoDB, Amazon RDS, and Amazon Redshift into Amazon S3. Run Amazon Athena queries directly on the S3 files.
C. Use Amazon Athena Federated Query to join the data from all data sources.
D. Use Redshift Spectrum to query data from DynamoDB, Amazon RDS, and Amazon S3 directly from Redshift.

Show Suggested Answer

Suggested Answer: C 🗳️

by [deleted] at Jan. 21, 2024, 2:53 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

lucas_rfsb

Highly Voted 9 months ago

Selected Answer: C

I would go for C because Federated Query is typical for this porpouse. Besides, we don't need to add/duplicate resources in S3. But I see that, becasuse Athena is more optimized for S3, it can be considered a tricky question, since there can be more trade-offs to consider, such as data governance that are easier if data is centralized in S3 in my opinion.

upvoted 7 times

...

pypelyncar

Most Recent 6 months, 3 weeks ago

Selected Answer: C

Serverless Processing: Athena is a serverless query service, meaning you only pay for the queries you run. This eliminates the need to provision and manage compute resources like in EMR clusters, making it ideal for one-time jobs. Federated Query Capability: Athena Federated Query allows you to directly query data from various sources like DynamoDB, RDS, Redshift, and S3 without physically moving the data. This eliminates data movement costs and simplifies the analysis process. Reduced Cost for Large Datasets: Compared to copying data to S3, which can be expensive for large datasets, Athena Federated Query avoids unnecessary data movement, reducing overall costs.

upvoted 4 times

...

certplan

9 months, 2 weeks ago

Amazon Athena Federated Query allows you to query data from multiple federated data sources including relational databases, NoSQL databases, and object stores directly from Athena. While this might seem like an efficient way to join data from different sources without the need for copying data into Amazon S3, it's essential to consider the cost implications. AWS documentation on Amazon Athena Federated Query [1] explains that while Federated Query enables you to query data from external data sources without data movement, it does not eliminate data transfer costs. Depending on the data sources involved (such as Amazon RDS, DynamoDB, etc.), there might be data transfer costs associated with querying data directly from these sources. [1] Amazon Athena Federated Query Documentation: https://docs.aws.amazon.com/athena/latest/ug/federated-data-sources.html

upvoted 2 times

...

certplan

9 months, 2 weeks ago

1. Data Storage Costs: Storing data in Amazon S3 is generally cheaper compared to the other AWS storage options like Amazon Redshift or Amazon RDS. 2. Compute Costs: Amazon: Athena is a serverless query service that allows you to query data directly from S3 without the need for provisioning or managing infrastructure. You only pay for the queries you run, which can be more cost-effective compared to provisioning an EMR cluster (option A) or using Redshift Spectrum (option D), both of which involve compute resources that you might not fully utilize. 3. Data Transfer Costs: Option B involves copying the data once into S3, and then there are no additional data transfer costs for querying the data using Athena. In contrast, options A and D would involve data transfer costs as data is moved between different services. Amazon Athena Pricing: https://aws.amazon.com/athena/pricing/ Amazon S3 Pricing: https://aws.amazon.com/s3/pricing/

upvoted 1 times

...

certplan

9 months, 2 weeks ago

Point: "perform a one-time analysis job" Option C (Amazon Athena Federated Query) might seem appealing, but it's generally more suited for querying data from external sources without copying the data into S3. However, since the data is already within AWS services, copying it to S3 and using Athena directly would likely be more cost-effective.

upvoted 1 times

...

[Removed]

11 months, 2 weeks ago

Selected Answer: C

You can query these sources by using Federated Queries, which is a native feature of Athena. The other options may increase costs and operational overhead, as they use more than one service to achieve the same result https://docs.aws.amazon.com/athena/latest/ug/connectors-available.html

upvoted 4 times

GiorgioGss

9 months, 3 weeks ago

Agree. C

upvoted 2 times

...