exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 113 discussion

A human resources company maintains a 10-node Amazon Redshift cluster to run analytics queries on the company's data. The Amazon Redshift cluster contains a product table and a transactions table, and both tables have a product_sku column. The tables are over 100 GB in size. The majority of queries run on both tables.
Which distribution style should the company use for the two tables to achieve optimal query performance?

  • A. An EVEN distribution style for both tables
  • B. A KEY distribution style for both tables
  • C. An ALL distribution style for the product table and an EVEN distribution style for the transactions table
  • D. An EVEN distribution style for the product table and an KEY distribution style for the transactions table
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
lakeswimmer
Highly Voted 3 years, 6 months ago
B - KEY - both tables are huge and have common key . ALL-distribution style for the product table not correct because of size EVEN distribution style for the product table - may not necessarily help.
upvoted 16 times
iris22
3 years, 2 months ago
ref: https://docs.aws.amazon.com/redshift/latest/dg/t_designating_distribution_styles.html
upvoted 3 times
...
...
Fazil_Cp
Highly Voted 3 years, 7 months ago
Option B - As both tables have a common key and are used widely in reports.https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html
upvoted 5 times
...
pk349
Most Recent 2 years, 1 month ago
B: I passed the test
upvoted 3 times
...
rags1482
2 years, 2 months ago
B. A KEY distribution style for both tables would be the best option for optimal query performance. Explanation: When using a KEY distribution style, Redshift distributes the rows according to the values in one column that serves as the distribution key. If the product_sku column is used as the distribution key for both the product table and the transactions table, rows with the same product_sku value will be stored on the same node. This will reduce the amount of data that needs to be moved between nodes during a query, resulting in faster query performance.
upvoted 4 times
...
Chelseajcole
2 years, 5 months ago
Even Walmart product table might not reach 100G size...I think the question is not well-stated
upvoted 2 times
...
rav009
2 years, 7 months ago
Selected Answer: B
This question is very tricky. Usually product table is a dimension table and it should be small. But here "The tables are over 100GB in size" So B.
upvoted 2 times
...
cloudlearnerhere
2 years, 7 months ago
Correct answer is C as the key requirement is to have optimal query performance, it would be better to use ALL distribution style for the product dimension table and EVEN distribution style for the transactions fact table. Options A & D are wrong as they would not provide optimal query performance. Option B is wrong as the KEY distribution style for both tables can be used but it won't provide an optimal query performance.
upvoted 2 times
cloudlearnerhere
2 years, 7 months ago
Distribute the fact table and its largest dimension table on their common columns. Choose the largest dimension based on the size of dataset that participates in the most common join, not just the size of the table. If a table is commonly filtered, using a WHERE clause, only a portion of its rows participate in the join. Such a table has less impact on redistribution than a smaller table that contributes more data. Designate both the dimension table's primary key and the fact table's corresponding foreign key as DISTKEY. If multiple tables use the same distribution key, they are also collocated with the fact table. Your fact table can have only one distribution key. Any tables that join on another key isn't collocated with the fact table.
upvoted 1 times
chdorrego
2 years, 6 months ago
Agree with everything you said but the 100GB size in both tables makes it better to distribute it by Key.
upvoted 1 times
...
...
...
fqc
2 years, 10 months ago
Selected Answer: B
Option B - As both tables have a common key and are used widely in reports.
upvoted 1 times
...
rocky48
2 years, 11 months ago
Selected Answer: B
Option B - As both tables have a common key and are used widely in reports.
upvoted 1 times
...
dushmantha
2 years, 11 months ago
Selected Answer: A
The question doesn't tell what type of queries are common. Key distribution is good if there are joins. All-style can be ruled out due to the size concerns. So in this case I would use Even distribution style. There is no evidence to tell that Product column is the most suitable colum for partitioning.
upvoted 1 times
...
Ramshizzle
3 years ago
Selected Answer: B
I also think answer B. But the question is unclear in my opinion and it could be C. What pushes me to B is the fact that it is not mentioned explicitly that the product table is small or can be considered a Fact-Table. It is explicitly mentioned that both tables share a common field.
upvoted 1 times
...
Bik000
3 years ago
Selected Answer: B
Answer is B
upvoted 1 times
...
MWL
3 years, 1 month ago
Selected Answer: B
B - KEY - both tables are huge and have common key .
upvoted 1 times
...
MWL
3 years, 1 month ago
Selected Answer: B
Choose B based on lakeswimmer 's comment.
upvoted 2 times
...
sbxme
3 years, 2 months ago
B is correct because there is common column between table to join and query
upvoted 1 times
...
pidkiller
3 years, 2 months ago
Selected Answer: B
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html Distribute the fact table and one dimension table on their common columns. Your fact table can have only one distribution key. Any tables that join on another key aren't collocated with the fact table. Choose one dimension to collocate based on how frequently it is joined and the size of the joining rows. Designate both the dimension table's primary key and the fact table's corresponding foreign key as the DISTKEY.
upvoted 2 times
...
pidkiller
3 years, 2 months ago
The question is not clear enough. It really depends on the queries run and the size of EACH table. I would answer B assuming that when you query for a product, you might want to see all transactions of this product. All product is a dimension, it could still be a really large table to have an ALL distribution. Maybe if both tables are partitioned by the sku, performance would be better.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...