Exam AWS Certified Big Data - Specialty topic 1 question 18 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty

Question #: 18
Topic #: 1

[All AWS Certified Big Data - Specialty Questions]

A social media customer has data from different data sources including RDS running MySQL, Redshift, and
Hive on EMR. To support better analysis, the customer needs to be able to analyze data from different data sources and to combine the results.
What is the most cost-effective solution to meet these requirements?

A. Load all data from a different database/warehouse to S3. Use Redshift COPY command to copy data to Redshift for analysis.
B. Install Presto on the EMR cluster where Hive sits. Configure MySQL and PostgreSQL connector to select from different data sources in a single query.
C. Spin up an Elasticsearch cluster. Load data from all three data sources and use Kibana to analyze.
D. Write a program running on a separate EC2 instance to run queries to three different systems. Aggregate the results after getting the responses from all three systems.

Show Suggested Answer

Suggested Answer: B 🗳️

by exams at Sept. 18, 2019, 6:01 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

san2020

Highly Voted 3 years, 7 months ago

my selection B

upvoted 5 times

...

Debi_mishra

Most Recent 3 years, 7 months ago

Technically B is wrong, as it mentions nothing about redshift.

upvoted 1 times

tubadc

3 years, 7 months ago

It's be, Redshift connector is the sames as Postrges connector, you only need to set one connector.name=redshift connection-url=jdbc:postgresql://example.net:5439/database connection-user=root connection-password=secret

upvoted 2 times

...

Josh1981

3 years, 7 months ago

I got with B

upvoted 1 times

...

YashBindlish

3 years, 7 months ago

Correct Answer is B as Presto (or PrestoDB) is an open source, distributed SQL query engine, designed from the ground up for fast analytic queries against data of any size. It supports both non-relational sources, such as the Hadoop Distributed File System (HDFS), Amazon S3, Cassandra, MongoDB, and HBase, and relational data sources such as MySQL, PostgreSQL, Amazon Redshift, Microsoft SQL Server, and Teradata. https://aws.amazon.com/big-data/what-is-presto/

upvoted 3 times

...

BigEv

3 years, 8 months ago

Why not C? It seems a lot more easier to ingest data to Elasticsearch

upvoted 1 times

G3

3 years, 8 months ago

The question says most cost-effective way, and Elasticsearch is an expensive option. I feel its B.

upvoted 2 times

...

kamikazestar

3 years, 7 months ago

Because 'Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources' (source: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto.html), and EMR had been already provisioned.

upvoted 2 times