exam questions

Exam AWS Certified Solutions Architect - Professional SAP-C02 All Questions

View all questions & answers for the AWS Certified Solutions Architect - Professional SAP-C02 exam

Exam AWS Certified Solutions Architect - Professional SAP-C02 topic 1 question 281 discussion

A company is collecting a large amount of data from a fleet of IoT devices. Data is stored as Optimized Row Columnar (ORC) files in the Hadoop Distributed File System (HDFS) on a persistent Amazon EMR cluster. The company's data analytics team queries the data by using SQL in Apache Presto deployed on the same EMR cluster. Queries scan large amounts of data, always run for less than 15 minutes, and run only between 5 PM and 10 PM.

The company is concerned about the high cost associated with the current solution. A solutions architect must propose the most cost-effective solution that will allow SQL data queries.

Which solution will meet these requirements?

  • A. Store data in Amazon S3. Use Amazon Redshift Spectrum to query data.
  • B. Store data in Amazon S3. Use the AWS Glue Data Catalog and Amazon Athena to query data.
  • C. Store data in EMR File System (EMRFS). Use Presto in Amazon EMR to query data.
  • D. Store data in Amazon Redshift. Use Amazon Redshift to query data.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Alabi
Highly Voted 1 year, 10 months ago
Selected Answer: B
Storing the data in Amazon S3 is a cost-effective solution compared to running a persistent EMR cluster with HDFS. The AWS Glue Data Catalog provides a centralized metadata repository for organizing and cataloging data in S3. Amazon Athena is a serverless query service that allows you to run SQL queries directly against data in S3 without the need for a dedicated cluster or infrastructure. By using Amazon Athena, you only pay for the queries you run, which aligns with the requirement of cost-effectiveness.
upvoted 6 times
...
sarlos
Most Recent 11 months, 3 weeks ago
Why not D , Is it because it is expensive?
upvoted 1 times
helloworldabc
8 months, 2 weeks ago
just B
upvoted 1 times
...
kgpoj
8 months, 3 weeks ago
Yeah, you don't wanna build a Redshift cluster for it. You store data in S3, and use Athena to query it, so you just pay for the query you run rather than paying for the whole Redshift cluster
upvoted 1 times
...
...
TonytheTiger
1 year, 1 month ago
Selected Answer: B
Option B - Athena can connect to your data stored in Amazon S3 using the AWS Glue Data Catalog to store metadata such as table and column names. After the connection is made, your databases, tables, and views appear in Athena's query editor. https://docs.aws.amazon.com/athena/latest/ug/data-sources-glue.html
upvoted 2 times
...
kejam
1 year, 3 months ago
Selected Answer: C
The question doesn't provide enough info to calculate the answer. We need to know how large the emr cluster is, how many queries, and how many TBs/PBs of data per query per day. However I'm leaning towards... Answer C: Store data in EMR File System (EMRFS). Use Presto in Amazon EMR to query data. EMRFS is an implementation of HDFS that all Amazon EMR clusters use for reading and writing regular files from Amazon EMR directly to Amazon S3. The company could switch to EMRFS and continue to use Presto which comes included in EMR and turn off the clusters when not in use while the data persists in EMRFS(S3). EMR comes in many flavors with different price points (EC2, Serverless) and is geared more towards daily data pipelines like this company is running. Regarding B: Athena is serverless and great for ad-hoc queries, but it is not cheap.
upvoted 1 times
...
CProgrammer
1 year, 4 months ago
significantly more expensive to store data in Redshift compared to S3 HOWEVER https://docs.aws.amazon.com/redshift/latest/gsg/data-lake.html You can use Amazon Redshift Spectrum to query data in Amazon S3 files without having to load the data into Amazon Redshift tables. Athena: While cost-effective for occasional ad-hoc queries, Athena's serverless architecture may not be as performant for frequent, resource-intensive queries [Queries scan large amounts of data]
upvoted 2 times
...
career360guru
1 year, 5 months ago
Selected Answer: B
B is most cost effective. A Redshift Spectrum can be a good option but then it needs Reshift cluster which my be more expensive. One information missing in the question is many queries/sec. If there are large number queries/sec then A can be better choice.
upvoted 3 times
...
ggrodskiy
1 year, 9 months ago
Correct B
upvoted 1 times
...
NikkyDicky
1 year, 10 months ago
Selected Answer: B
it's a B
upvoted 2 times
...
SkyZeroZx
1 year, 10 months ago
Selected Answer: B
Clasic ServerLess S3 Datalake Glue for ETL Athena for Query
upvoted 4 times
...
SmileyCloud
1 year, 10 months ago
Selected Answer: B
B - S3 , GDC and Athena for sure is the cheapest.
upvoted 1 times
...
shree2023
1 year, 10 months ago
Selected Answer: B
B is most cost effective
upvoted 1 times
...
gd1
1 year, 10 months ago
Selected Answer: B
S3 with Glue and Athena will do the trick
upvoted 1 times
...
PhuocT
1 year, 10 months ago
Selected Answer: B
B could be the answer
upvoted 1 times
...
bhanus
1 year, 10 months ago
Selected Answer: B
B is the answer
upvoted 1 times
...
psyx21
1 year, 10 months ago
Selected Answer: B
Correct Answer is B
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago