exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 76 discussion

A company stores datasets in JSON format and .csv format in an Amazon S3 bucket. The company has Amazon RDS for Microsoft SQL Server databases, Amazon DynamoDB tables that are in provisioned capacity mode, and an Amazon Redshift cluster. A data engineering team must develop a solution that will give data scientists the ability to query all data sources by using syntax similar to SQL.
Which solution will meet these requirements with the LEAST operational overhead?

  • A. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Amazon Athena to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format.
  • B. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Redshift Spectrum to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format.
  • C. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use AWS Glue jobs to transform data that is in JSON format to Apache Parquet or .csv format. Store the transformed data in an S3 bucket. Use Amazon Athena to query the original and transformed data from the S3 bucket.
  • D. Use AWS Lake Formation to create a data lake. Use Lake Formation jobs to transform the data from all data sources to Apache Parquet format. Store the transformed data in an S3 bucket. Use Amazon Athena or Redshift Spectrum to query the data.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
GiorgioGss
Highly Voted 1 year, 2 months ago
Selected Answer: A
LEAST operational overhead? query straight with Athena without any intermediate actions or services
upvoted 7 times
...
pypelyncar
Most Recent 11 months, 1 week ago
Selected Answer: A
thena natively supports querying JSON data stored in S3 using standard SQL functions. This eliminates the need for additional data transformation steps using Glue jobs (as required in Option C or D).
upvoted 1 times
...
tgv
11 months, 3 weeks ago
As chris_spencer mentioned below, now Athena supports querying with PartiQL which technically makes the answer A correct.
upvoted 1 times
...
VerRi
12 months ago
Selected Answer: A
B requires Redshift Spectrum, so A
upvoted 1 times
...
chris_spencer
1 year, 1 month ago
Selected Answer: C
Answer should be C. Amazon Athena does not support querying with PartiQL until 16.04.2024, https://aws.amazon.com/about-aws/whats-new/2024/04/amazon-athena-federated-query-pass-through/ The DEA01 exam should not have include the latest feature
upvoted 2 times
...
Christina666
1 year, 1 month ago
Selected Answer: A
A. Unified Querying with Athena: Athena provides a SQL-like interface for querying various data sources, including JSON and CSV in S3, as well as traditional databases. PartiQL Support: Athena's PartiQL extension allows querying semi-structured JSON data directly, eliminating the need for a separate query engine. Serverless and Managed: Both AWS Glue and Athena are serverless, minimizing infrastructure management for the data engineers. No Unnecessary Transformations: Avoiding transformations for JSON data simplifies the pipeline and reduces operational overhead. B. Redshift Spectrum: While Spectrum can query external data, it's primarily intended for Redshift data warehouse extensions. It adds complexity for the RDS and DynamoDB data sources.
upvoted 4 times
...
lucas_rfsb
1 year, 1 month ago
Selected Answer: B
I will go with B
upvoted 4 times
nyaopoko
1 year, 1 month ago
B is the best choice: AWS Glue Data Catalog: AWS Glue can crawl and catalog the data sources (S3 buckets, RDS databases, DynamoDB tables) and store the metadata in the AWS Glue Data Catalog. This provides a centralized metadata repository for all data sources. Amazon Redshift Spectrum: Redshift Spectrum is a feature of Amazon Redshift that allows you to query data directly from various data sources, including S3 buckets, without loading the data into Redshift tables. This means you can query the JSON and CSV files in S3, as well as the RDS and DynamoDB data sources, using standard SQL syntax. SQL and PartiQL Support: Redshift Spectrum supports querying structured data sources (like RDS and CSV files) using SQL, and querying semi-structured data sources (like JSON files) using PartiQL, which is a SQL-compatible query language for JSON data.
upvoted 1 times
...
...
Luke97
1 year, 1 month ago
The answer should be B. A is incorrect because Athena does NOT support PartiQL. C is NOT the least operational (has the additional step to convert JSON to Parquet or csv) D is incorrect because DynamoDB export data to S3 in DynamoDB JSON or Amzone Ion format only (https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/).
upvoted 4 times
...
halogi
1 year, 1 month ago
Selected Answer: C
AWS Athena can only query in SQL, not PartiQL, so both A and B are incorrect. LakeFormation can not work directly with DynamoDB, so D is incorrect. The only acceptable answer is C
upvoted 2 times
andrevus
1 year, 1 month ago
similar to SQL, so A
upvoted 1 times
...
...
rralucard_
1 year, 3 months ago
Selected Answer: A
Option A, using AWS Glue and Amazon Athena, would meet the requirements with the least operational overhead. This solution allows data scientists to directly query data in its original format without the need for additional data transformation steps, making it easier to implement and manage.
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago