exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 25 discussion

A data engineer needs to join data from multiple sources to perform a one-time analysis job. The data is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3.
Which solution will meet this requirement MOST cost-effectively?

  • A. Use an Amazon EMR provisioned cluster to read from all sources. Use Apache Spark to join the data and perform the analysis.
  • B. Copy the data from DynamoDB, Amazon RDS, and Amazon Redshift into Amazon S3. Run Amazon Athena queries directly on the S3 files.
  • C. Use Amazon Athena Federated Query to join the data from all data sources.
  • D. Use Redshift Spectrum to query data from DynamoDB, Amazon RDS, and Amazon S3 directly from Redshift.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
lucas_rfsb
Highly Voted 9 months ago
Selected Answer: C
I would go for C because Federated Query is typical for this porpouse. Besides, we don't need to add/duplicate resources in S3. But I see that, becasuse Athena is more optimized for S3, it can be considered a tricky question, since there can be more trade-offs to consider, such as data governance that are easier if data is centralized in S3 in my opinion.
upvoted 7 times
...
pypelyncar
Most Recent 6 months, 3 weeks ago
Selected Answer: C
Serverless Processing: Athena is a serverless query service, meaning you only pay for the queries you run. This eliminates the need to provision and manage compute resources like in EMR clusters, making it ideal for one-time jobs. Federated Query Capability: Athena Federated Query allows you to directly query data from various sources like DynamoDB, RDS, Redshift, and S3 without physically moving the data. This eliminates data movement costs and simplifies the analysis process. Reduced Cost for Large Datasets: Compared to copying data to S3, which can be expensive for large datasets, Athena Federated Query avoids unnecessary data movement, reducing overall costs.
upvoted 4 times
...
certplan
9 months, 2 weeks ago
Amazon Athena Federated Query allows you to query data from multiple federated data sources including relational databases, NoSQL databases, and object stores directly from Athena. While this might seem like an efficient way to join data from different sources without the need for copying data into Amazon S3, it's essential to consider the cost implications. AWS documentation on Amazon Athena Federated Query [1] explains that while Federated Query enables you to query data from external data sources without data movement, it does not eliminate data transfer costs. Depending on the data sources involved (such as Amazon RDS, DynamoDB, etc.), there might be data transfer costs associated with querying data directly from these sources. [1] Amazon Athena Federated Query Documentation: https://docs.aws.amazon.com/athena/latest/ug/federated-data-sources.html
upvoted 2 times
...
certplan
9 months, 2 weeks ago
1. Data Storage Costs: Storing data in Amazon S3 is generally cheaper compared to the other AWS storage options like Amazon Redshift or Amazon RDS. 2. Compute Costs: Amazon: Athena is a serverless query service that allows you to query data directly from S3 without the need for provisioning or managing infrastructure. You only pay for the queries you run, which can be more cost-effective compared to provisioning an EMR cluster (option A) or using Redshift Spectrum (option D), both of which involve compute resources that you might not fully utilize. 3. Data Transfer Costs: Option B involves copying the data once into S3, and then there are no additional data transfer costs for querying the data using Athena. In contrast, options A and D would involve data transfer costs as data is moved between different services. Amazon Athena Pricing: https://aws.amazon.com/athena/pricing/ Amazon S3 Pricing: https://aws.amazon.com/s3/pricing/
upvoted 1 times
...
certplan
9 months, 2 weeks ago
Point: "perform a one-time analysis job" Option C (Amazon Athena Federated Query) might seem appealing, but it's generally more suited for querying data from external sources without copying the data into S3. However, since the data is already within AWS services, copying it to S3 and using Athena directly would likely be more cost-effective.
upvoted 1 times
...
[Removed]
11 months, 2 weeks ago
Selected Answer: C
You can query these sources by using Federated Queries, which is a native feature of Athena. The other options may increase costs and operational overhead, as they use more than one service to achieve the same result https://docs.aws.amazon.com/athena/latest/ug/connectors-available.html
upvoted 4 times
GiorgioGss
9 months, 3 weeks ago
Agree. C
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...