exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 20 discussion

A large ride-sharing company has thousands of drivers globally serving millions of unique customers every day. The company has decided to migrate an existing data mart to Amazon Redshift. The existing schema includes the following tables.
✑ A trips fact table for information on completed rides.
✑ A drivers dimension table for driver profiles.
✑ A customers fact table holding customer profile information.
The company analyzes trip details by date and destination to examine profitability by region. The drivers data rarely changes. The customers data frequently changes.
What table design provides optimal query performance?

  • A. Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers and customers tables.
  • B. Use DISTSTYLE EVEN for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
  • C. Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
  • D. Use DISTSTYLE EVEN for the drivers table and sort by date. Use DISTSTYLE ALL for both fact tables.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
zanhsieh
Highly Voted 3 years, 11 months ago
C. Drivers’ data -> ALL, Customer’s data -> EVEN, Trips table -> KEY (destination) & sort by date https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html https://slideshare.net/AmazonWebServices/deep-dive-on-amazon-redshift-80877515
upvoted 38 times
GeeBeeEl
3 years, 10 months ago
Not sure how the 2 links gave you this answer! Not saying your suggestion is wrong
upvoted 2 times
...
...
jove
Highly Voted 3 years, 10 months ago
IMO, it should be B.. Reasons: Distributing the data on destination might cause a data skew which we don't want. If there is no clear dist key for a fact, it's better to dist it evenly.
upvoted 12 times
...
NikkyDicky
Most Recent 2 years ago
Selected Answer: C
C is right
upvoted 1 times
...
penguins2
2 years ago
C is the correct answer!
upvoted 1 times
...
pk349
2 years, 3 months ago
C: I passed the test
upvoted 2 times
...
cloudlearnerhere
2 years, 9 months ago
Selected Answer: C
Correct answer is C as as the trip would be queries on destination and date, the trips table needs a DISTSTYLE KEY (destination) for the trips table and sort by date. As the drivers data rarely changes DISTSTYLE ALL can be applied for the drivers table, which will maintain a copy per node. Also as customers data changes frequently,
upvoted 5 times
...
Arka_01
2 years, 11 months ago
Selected Answer: C
Use All Distribution for rarely changing tables, as they are copied to all slices. Use Even distribution to frequently changing and large tables, as Redshift engine can randomly distribute them to different data slices. Use diststyle key for the tables where you know a join key.
upvoted 8 times
...
rocky48
3 years ago
Answer is C
upvoted 1 times
...
Bik000
3 years, 3 months ago
Selected Answer: C
Answer is C
upvoted 1 times
...
MWL
3 years, 3 months ago
Selected Answer: C
C should be right. The data will be analized by destination. And the requirement doesn't mention that it need to join trip and customer/driver table. So, using DISTSTYLE KEY (destination) for the trips should improve the performance of trip data. For A, there are millons of customers, so using ALL for that will cause too much copy data.
upvoted 2 times
...
Shivanikats
3 years, 7 months ago
I think answer is B. Destination is not as unique a key and can cause skew with key partition. So even for the trips seems right. Drivers is updated rarely, so suitable for All. Cust is updated often so Even is good for it too.
upvoted 3 times
...
Bambur
3 years, 9 months ago
The answer is B. We don't know which key (with high cardinality) to use for fact table for even distribution so we should chose even diststyle and use fields that should be used to access data as sort key.. Rare updated dimension table good candidate for diststyle all, and another one frequently updated should have even diststyle.
upvoted 4 times
...
Huy
3 years, 10 months ago
FACT table in DW is central table and will be queried the most. Because we query trips by destination and date therefore the higher cardinality is destination -> use Destination as key. Drivers is dimension table and data is small so can be ALL to speedup the join. Customer data frequently changes and we are not sure which columns should be joined so EVEN is safe.
upvoted 4 times
...
Donell
3 years, 10 months ago
Answer C
upvoted 1 times
...
Shraddha
3 years, 10 months ago
Ans C..This is a textbook question. However, if there is an answer where both customer and driver tables are EVEN, I would go for it. ALL does not quite give benefits over EVEN. https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html
upvoted 3 times
...
gunjan4392
3 years, 10 months ago
C seems okay
upvoted 1 times
...
ariane_tateishi
3 years, 10 months ago
C in my opnion is the best choice, because the Trip table is a fact table so it will join with the other tables, like customer and driver tables. The table customer is frequently updated, so in this case ALL is not recommended.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...