exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 47 discussion

A data analyst is designing a solution to interactively query datasets with SQL using a JDBC connection. Users will join data stored in Amazon S3 in Apache ORC format with data stored in Amazon OpenSearch Service (Amazon Elasticsearch Service) and Amazon Aurora MySQL.
Which solution will provide the MOST up-to-date results?

  • A. Use AWS Glue jobs to ETL data from Amazon ES and Aurora MySQL to Amazon S3. Query the data with Amazon Athena.
  • B. Use Amazon DMS to stream data from Amazon ES and Aurora MySQL to Amazon Redshift. Query the data with Amazon Redshift.
  • C. Query all the datasets in place with Apache Spark SQL running on an AWS Glue developer endpoint.
  • D. Query all the datasets in place with Apache Presto running on Amazon EMR.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cloudlearnerhere
Highly Voted 2 years, 6 months ago
Selected Answer: D
Correct answer is D as Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources. Option A is wrong as Glue is not ideal for interactive queries but more for batch ETL jobs. Option B is wrong as it would not provide the up-to-date results as the data needs to copied over to Redshift for querying. Also, it does not cover S3 which would need Redshift Spectrum. Option C is wrong as Spark SQL does not allow the capability to query multiple data sources. Also, Glue Developer Endpoints help test Glue ETL jobs.
upvoted 17 times
cloudlearnerhere
2 years, 6 months ago
Presto is an open-source distributed SQL query engine optimized for low-latency, ad-hoc analysis of data. It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions. Presto can process data from multiple data sources including the Hadoop Distributed File System (HDFS) and Amazon S3. Presto uses a custom query execution engine with operators designed to support SQL semantics. Different from Hive/MapReduce, Presto executes queries in memory, pipelined across the network between stages, thus avoiding unnecessary I/O. The pipelined execution model runs multiple stages in parallel and streams data from one stage to the next as it becomes available. Presto supports the ANSI SQL standard, which makes it easy for data analysts and developers to query both structured and unstructured data at scale. Currently, Presto supports a wide variety of SQL functionality, including complex queries, aggregations, joins, and window functions.
upvoted 6 times
...
...
CHRIS12722222
Highly Voted 3 years ago
Answer = D (use presto)
upvoted 8 times
...
pk349
Most Recent 2 years ago
D: I passed the test
upvoted 1 times
...
anjuvinayan
2 years ago
Answer is D A-not upto date data as its glue job B- Up-to-date data as its DMS but DMS to ES integeration not possible C-Not interactive
upvoted 1 times
...
Arka_01
2 years, 7 months ago
Selected Answer: D
Disparate data sources are present and OpenSearch data cannot be ingested via DMS.
upvoted 1 times
...
rocky48
2 years, 9 months ago
Selected Answer: D
Answer = D
upvoted 1 times
...
Bik000
2 years, 12 months ago
Selected Answer: D
Answer should be D
upvoted 2 times
...
Bik000
2 years, 12 months ago
Selected Answer: D
Answer should be D
upvoted 2 times
...
MWL
3 years ago
Selected Answer: D
For A, I didn't find document about exporting data from open search to S3. B: DMS doesn't support to export from ES. C: to use spark SQL to query from ES, we also need a third-party connector, so C is not complete. D: should work.
upvoted 3 times
...
chp2022
3 years ago
Selected Answer: B
I say it should be B
upvoted 1 times
...
AWSRanger
3 years ago
Selected Answer: D
D is correct
upvoted 1 times
...
Tsyva
3 years ago
Selected Answer: B
IMO the answer should be B since its highlighted that it must be most up to-date results and DMS supports change data capture.
upvoted 1 times
...
rb39
3 years ago
Selected Answer: D
Most up-to-date -> in-place queries if possible, so Presto. Athena solution implies moving data from RDS to S3 so adds a potential delay (S3 is not real-time)
upvoted 2 times
...
[Removed]
3 years ago
Selected Answer: A
JDBC connection is a key. so Athena is the answer
upvoted 1 times
Lazy_Lord
2 years, 5 months ago
You can use JDBC on the Presto running on EMR: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/presto-adding-db-connectors.html
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago