exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 54 discussion

A marketing company wants to improve its reporting and business intelligence capabilities. During the planning phase, the company interviewed the relevant stakeholders and discovered that:
✑ The operations team reports are run hourly for the current month's data.
✑ The sales team wants to use multiple Amazon QuickSight dashboards to show a rolling view of the last 30 days based on several categories. The sales team also wants to view the data as soon as it reaches the reporting backend.
✑ The finance team's reports are run daily for last month's data and once a month for the last 24 months of data.
Currently, there is 400 TB of data in the system with an expected additional 100 TB added every month. The company is looking for a solution that is as cost- effective as possible.
Which solution meets the company's requirements?

  • A. Store the last 24 months of data in Amazon Redshift. Configure Amazon QuickSight with Amazon Redshift as the data source.
  • B. Store the last 2 months of data in Amazon Redshift and the rest of the months in Amazon S3. Set up an external schema and table for Amazon Redshift Spectrum. Configure Amazon QuickSight with Amazon Redshift as the data source.
  • C. Store the last 24 months of data in Amazon S3 and query it using Amazon Redshift Spectrum. Configure Amazon QuickSight with Amazon Redshift Spectrum as the data source.
  • D. Store the last 2 months of data in Amazon Redshift and the rest of the months in Amazon S3. Use a long-running Amazon EMR with Apache Spark cluster to query the data as needed. Configure Amazon QuickSight with Amazon EMR as the data source.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ramozo
Highly Voted 3 years, 8 months ago
For me is B. Redshift offers better performance for querying and analyzing latest 2 months of data and in combination with Spectrum for non-frequent queries on 24 months of data.
upvoted 41 times
Katana19
3 years, 8 months ago
they didnt require performance, the required cost-effectiveness !! 2 months of data means 200TB... a redshift cluster of 200TB is not cheap !!!
upvoted 5 times
Gavin_Y
2 years, 9 months ago
it's mentioned 'The sales team also wants to view the data as soon as it reaches the reporting backend', so I think there require perfomance
upvoted 2 times
...
...
...
kempstonjoystick
Highly Voted 3 years, 8 months ago
https://aws.amazon.com/premiumsupport/knowledge-center/redshift-spectrum-query-charges/ "Load the data in S3 and use Redshift Spectrum if the data is infrequently accessed." In this case, the operations data is accessed hourly; this is not infrequent. I think even with the statement of cost effective, the answer is B. IF all the monthly data of 100TB is scanned hourly as part of those reports (and there's nothing in the question to say it isn't), then the cost becomes 100TB * $5 * ~720 hours in a month, which is $360,000 per month! The storage costs for 200TB of Redshift data is $5000 a month.
upvoted 15 times
...
pk349
Most Recent 2 years, 1 month ago
B: I passed the test
upvoted 1 times
...
AwsNewPeople
2 years, 3 months ago
Selected Answer: B
Option B seems to be the best solution for this scenario. It suggests storing the last 2 months of data in Amazon Redshift and the rest of the months in Amazon S3. Setting up an external schema and table for Amazon Redshift Spectrum will allow for querying data stored in Amazon S3. Additionally, configuring Amazon QuickSight with Amazon Redshift as the data source will allow for creating reports and dashboards for the data. This solution is cost-effective because it uses Amazon S3 to store the majority of the data, which is cheaper than storing it all in Amazon Redshift. Also, it leverages Amazon Redshift Spectrum, which allows for querying data in Amazon S3 using a standard SQL interface without needing to move the data into Amazon Redshift. Finally, storing only two months of data in Amazon Redshift will minimize storage costs in Redshift while still allowing for fast query performance for the most recent data.
upvoted 1 times
...
cloudlearnerhere
2 years, 7 months ago
Selected Answer: B
Correct answer is B as the base requirements are cost and performance, keeping data in Redshift for 2 months would allow data analysis for current and previous month. Holding data for 24 months in S3 would provide a cost-effective option. Option A is wrong as holding 24 months data in Redshift is not cost-effective. Option C is wrong as storing 24 months of data in S3 would not provide performance. Option D is wrong as using a long-running EMR cluster is not cost-effective.
upvoted 4 times
...
Arka_01
2 years, 8 months ago
Selected Answer: B
"as cost- effective as possible" - this is the key statement here. So we need fast retrieval and query performance on last 2 months data, and infrequent querying capability for last 24 months of data. So B is the correct answer.
upvoted 1 times
...
dushmantha
2 years, 10 months ago
Selected Answer: B
I would choose "B", although I had doubts to choose "C". The main reason for the switch is that, its not a very good use case of using Redshift Spectrum without using Redshift for any part of the job, I don't know if its possible. Ideally Redshift suppose to query hot data and Redshift Spectrum supposed to extend the querying capability to exabytes of data
upvoted 1 times
...
rocky48
2 years, 11 months ago
Selected Answer: B
B is the right answer.
upvoted 1 times
...
Bik000
3 years ago
Selected Answer: B
B should be Correct
upvoted 1 times
...
jrheen
3 years, 1 month ago
Answer - B
upvoted 1 times
...
Blueocean
3 years, 3 months ago
Agree B is the best and most cost effective option
upvoted 1 times
...
GoKhe
3 years, 4 months ago
B is the correct answer
upvoted 1 times
...
lixin2402
3 years, 6 months ago
Definitely, B is the right one. The cost was already being cut in half. EMR long-running instance is not cheap.
upvoted 1 times
...
aws2019
3 years, 6 months ago
B is the right answer
upvoted 1 times
...
goutes
3 years, 7 months ago
RD Spectrum can query exabytes of unstructured data in S3 without loading. It even supports gzip and snappy. So option C is correct.
upvoted 3 times
...
jueueuergen
3 years, 7 months ago
I think answer B is correct because of the hourly scanning costs. However, I think we don't have enough information: - what are the usage patterns? is only a small subset of columns required? -> less scanning - what compression factor is possible? Spectrum seems to support compressed data, whereas data in Redshift seems to be uncompressed [Compression factor] * [column selection] can easily decrease the amount of data that needs to be scanned by a factor of 100x, possibly even 1000x or more. Finally, the phrasing "The sales team also wants to view the data as soon as it reaches the reporting backend." could go either way if you ask me - Spectrum doesn't introduce a lag because data is loaded lazily, however it leads to slower queries compared to Redshift.
upvoted 3 times
...
Huy
3 years, 7 months ago
B. One more thing that make C wrong is Spectrum only runs within a Redshift Cluster. Therefore you are both charged by the cluster and the data scanned.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...