Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 47 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 47
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A data analyst is designing a solution to interactively query datasets with SQL using a JDBC connection. Users will join data stored in Amazon S3 in Apache ORC format with data stored in Amazon OpenSearch Service (Amazon Elasticsearch Service) and Amazon Aurora MySQL.
Which solution will provide the MOST up-to-date results?

A. Use AWS Glue jobs to ETL data from Amazon ES and Aurora MySQL to Amazon S3. Query the data with Amazon Athena.
B. Use Amazon DMS to stream data from Amazon ES and Aurora MySQL to Amazon Redshift. Query the data with Amazon Redshift.
C. Query all the datasets in place with Apache Spark SQL running on an AWS Glue developer endpoint.
D. Query all the datasets in place with Apache Presto running on Amazon EMR.

Show Suggested Answer

Suggested Answer: D 🗳️

by CHRIS12722222 at April 21, 2022, 11:53 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

cloudlearnerhere

Highly Voted 2 years, 8 months ago

Selected Answer: D

Correct answer is D as Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources. Option A is wrong as Glue is not ideal for interactive queries but more for batch ETL jobs. Option B is wrong as it would not provide the up-to-date results as the data needs to copied over to Redshift for querying. Also, it does not cover S3 which would need Redshift Spectrum. Option C is wrong as Spark SQL does not allow the capability to query multiple data sources. Also, Glue Developer Endpoints help test Glue ETL jobs.

upvoted 17 times

cloudlearnerhere

2 years, 8 months ago

Presto is an open-source distributed SQL query engine optimized for low-latency, ad-hoc analysis of data. It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions. Presto can process data from multiple data sources including the Hadoop Distributed File System (HDFS) and Amazon S3. Presto uses a custom query execution engine with operators designed to support SQL semantics. Different from Hive/MapReduce, Presto executes queries in memory, pipelined across the network between stages, thus avoiding unnecessary I/O. The pipelined execution model runs multiple stages in parallel and streams data from one stage to the next as it becomes available. Presto supports the ANSI SQL standard, which makes it easy for data analysts and developers to query both structured and unstructured data at scale. Currently, Presto supports a wide variety of SQL functionality, including complex queries, aggregations, joins, and window functions.

upvoted 6 times

...

CHRIS12722222

Highly Voted 3 years, 2 months ago

Answer = D (use presto)

upvoted 8 times

...

pk349

Most Recent 2 years, 2 months ago

D: I passed the test

upvoted 1 times

...

anjuvinayan

2 years, 2 months ago

Answer is D A-not upto date data as its glue job B- Up-to-date data as its DMS but DMS to ES integeration not possible C-Not interactive

upvoted 1 times

...

Arka_01

2 years, 9 months ago

Selected Answer: D

Disparate data sources are present and OpenSearch data cannot be ingested via DMS.

upvoted 1 times

...

rocky48

2 years, 11 months ago

Selected Answer: D

Answer = D

upvoted 1 times

...

Bik000

3 years, 1 month ago

Selected Answer: D

Answer should be D

upvoted 2 times

...

Bik000

3 years, 1 month ago

Selected Answer: D

Answer should be D

upvoted 2 times

...

MWL

3 years, 1 month ago

Selected Answer: D

For A, I didn't find document about exporting data from open search to S3. B: DMS doesn't support to export from ES. C: to use spark SQL to query from ES, we also need a third-party connector, so C is not complete. D: should work.

upvoted 3 times

...

chp2022

3 years, 2 months ago

Selected Answer: B

I say it should be B

upvoted 1 times

...

AWSRanger

3 years, 2 months ago

Selected Answer: D

D is correct

upvoted 1 times

...

Tsyva

3 years, 2 months ago

Selected Answer: B

IMO the answer should be B since its highlighted that it must be most up to-date results and DMS supports change data capture.

upvoted 1 times

...

rb39

3 years, 2 months ago

Selected Answer: D

Most up-to-date -> in-place queries if possible, so Presto. Athena solution implies moving data from RDS to S3 so adds a potential delay (S3 is not real-time)

upvoted 2 times

...

[Removed]

3 years, 2 months ago

Selected Answer: A

JDBC connection is a key. so Athena is the answer

upvoted 1 times

Lazy_Lord

2 years, 7 months ago

You can use JDBC on the Presto running on EMR: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/presto-adding-db-connectors.html

upvoted 1 times

...