exam questions

Exam AWS Certified Big Data - Specialty All Questions

View all questions & answers for the AWS Certified Big Data - Specialty exam

Exam AWS Certified Big Data - Specialty topic 1 question 18 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty
Question #: 18
Topic #: 1
[All AWS Certified Big Data - Specialty Questions]

A social media customer has data from different data sources including RDS running MySQL, Redshift, and
Hive on EMR. To support better analysis, the customer needs to be able to analyze data from different data sources and to combine the results.
What is the most cost-effective solution to meet these requirements?

  • A. Load all data from a different database/warehouse to S3. Use Redshift COPY command to copy data to Redshift for analysis.
  • B. Install Presto on the EMR cluster where Hive sits. Configure MySQL and PostgreSQL connector to select from different data sources in a single query.
  • C. Spin up an Elasticsearch cluster. Load data from all three data sources and use Kibana to analyze.
  • D. Write a program running on a separate EC2 instance to run queries to three different systems. Aggregate the results after getting the responses from all three systems.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
san2020
Highly Voted 3 years, 7 months ago
my selection B
upvoted 5 times
...
Debi_mishra
Most Recent 3 years, 7 months ago
Technically B is wrong, as it mentions nothing about redshift.
upvoted 1 times
tubadc
3 years, 7 months ago
It's be, Redshift connector is the sames as Postrges connector, you only need to set one connector.name=redshift connection-url=jdbc:postgresql://example.net:5439/database connection-user=root connection-password=secret
upvoted 2 times
...
...
Josh1981
3 years, 7 months ago
I got with B
upvoted 1 times
...
YashBindlish
3 years, 7 months ago
Correct Answer is B as Presto (or PrestoDB) is an open source, distributed SQL query engine, designed from the ground up for fast analytic queries against data of any size. It supports both non-relational sources, such as the Hadoop Distributed File System (HDFS), Amazon S3, Cassandra, MongoDB, and HBase, and relational data sources such as MySQL, PostgreSQL, Amazon Redshift, Microsoft SQL Server, and Teradata. https://aws.amazon.com/big-data/what-is-presto/
upvoted 3 times
...
BigEv
3 years, 8 months ago
Why not C? It seems a lot more easier to ingest data to Elasticsearch
upvoted 1 times
G3
3 years, 8 months ago
The question says most cost-effective way, and Elasticsearch is an expensive option. I feel its B.
upvoted 2 times
...
kamikazestar
3 years, 7 months ago
Because 'Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources' (source: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto.html), and EMR had been already provisioned.
upvoted 2 times
antoneti
3 years, 7 months ago
I would agree if only mention "Presto" but why configure MySQL and PostgreSQL? this makes me go with C
upvoted 2 times
marwan
3 years, 7 months ago
because PostgreSQL is used for the redshift and MySQL for the RDS, Answer is B
upvoted 5 times
michelleY
3 years, 7 months ago
i agree, i think answer is B.
upvoted 1 times
...
...
...
...
...
M2
3 years, 8 months ago
A & D are for sure not the answer. question is only between b & c
upvoted 1 times
...
exams
3 years, 8 months ago
B is correct I think
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...