Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 113 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 113
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A human resources company maintains a 10-node Amazon Redshift cluster to run analytics queries on the company's data. The Amazon Redshift cluster contains a product table and a transactions table, and both tables have a product_sku column. The tables are over 100 GB in size. The majority of queries run on both tables.
Which distribution style should the company use for the two tables to achieve optimal query performance?

A. An EVEN distribution style for both tables
B. A KEY distribution style for both tables
C. An ALL distribution style for the product table and an EVEN distribution style for the transactions table
D. An EVEN distribution style for the product table and an KEY distribution style for the transactions table

Show Suggested Answer

Suggested Answer: B 🗳️

by Fazil_Cp at Nov. 9, 2021, 6:28 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

lakeswimmer

Highly Voted 3 years, 6 months ago

B - KEY - both tables are huge and have common key . ALL-distribution style for the product table not correct because of size EVEN distribution style for the product table - may not necessarily help.

upvoted 16 times

iris22

3 years, 2 months ago

ref: https://docs.aws.amazon.com/redshift/latest/dg/t_designating_distribution_styles.html

upvoted 3 times

...

Fazil_Cp

Highly Voted 3 years, 7 months ago

Option B - As both tables have a common key and are used widely in reports.https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html

upvoted 5 times

...

pk349

Most Recent 2 years, 1 month ago

B: I passed the test

upvoted 3 times

...

rags1482

2 years, 2 months ago

B. A KEY distribution style for both tables would be the best option for optimal query performance. Explanation: When using a KEY distribution style, Redshift distributes the rows according to the values in one column that serves as the distribution key. If the product_sku column is used as the distribution key for both the product table and the transactions table, rows with the same product_sku value will be stored on the same node. This will reduce the amount of data that needs to be moved between nodes during a query, resulting in faster query performance.

upvoted 4 times

...

Chelseajcole

2 years, 5 months ago

Even Walmart product table might not reach 100G size...I think the question is not well-stated

upvoted 2 times

...

rav009

2 years, 7 months ago

Selected Answer: B

This question is very tricky. Usually product table is a dimension table and it should be small. But here "The tables are over 100GB in size" So B.

upvoted 2 times

...

cloudlearnerhere

2 years, 7 months ago

Correct answer is C as the key requirement is to have optimal query performance, it would be better to use ALL distribution style for the product dimension table and EVEN distribution style for the transactions fact table. Options A & D are wrong as they would not provide optimal query performance. Option B is wrong as the KEY distribution style for both tables can be used but it won't provide an optimal query performance.

upvoted 2 times

cloudlearnerhere

2 years, 7 months ago

Distribute the fact table and its largest dimension table on their common columns. Choose the largest dimension based on the size of dataset that participates in the most common join, not just the size of the table. If a table is commonly filtered, using a WHERE clause, only a portion of its rows participate in the join. Such a table has less impact on redistribution than a smaller table that contributes more data. Designate both the dimension table's primary key and the fact table's corresponding foreign key as DISTKEY. If multiple tables use the same distribution key, they are also collocated with the fact table. Your fact table can have only one distribution key. Any tables that join on another key isn't collocated with the fact table.

upvoted 1 times

chdorrego

2 years, 6 months ago

Agree with everything you said but the 100GB size in both tables makes it better to distribute it by Key.

upvoted 1 times

...

fqc

2 years, 10 months ago

Selected Answer: B

Option B - As both tables have a common key and are used widely in reports.

upvoted 1 times

...

rocky48

2 years, 11 months ago

Selected Answer: B

Option B - As both tables have a common key and are used widely in reports.

upvoted 1 times

...

dushmantha

2 years, 11 months ago

Selected Answer: A

The question doesn't tell what type of queries are common. Key distribution is good if there are joins. All-style can be ruled out due to the size concerns. So in this case I would use Even distribution style. There is no evidence to tell that Product column is the most suitable colum for partitioning.

upvoted 1 times

...

Ramshizzle

3 years ago

Selected Answer: B

I also think answer B. But the question is unclear in my opinion and it could be C. What pushes me to B is the fact that it is not mentioned explicitly that the product table is small or can be considered a Fact-Table. It is explicitly mentioned that both tables share a common field.

upvoted 1 times

...

Bik000

3 years ago

Selected Answer: B

Answer is B

upvoted 1 times

...

MWL

3 years, 1 month ago

Selected Answer: B

B - KEY - both tables are huge and have common key .

upvoted 1 times

...

MWL

3 years, 1 month ago

Selected Answer: B

Choose B based on lakeswimmer 's comment.

upvoted 2 times

...

sbxme

3 years, 2 months ago

B is correct because there is common column between table to join and query

upvoted 1 times

...

pidkiller

3 years, 2 months ago

Selected Answer: B

https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html Distribute the fact table and one dimension table on their common columns. Your fact table can have only one distribution key. Any tables that join on another key aren't collocated with the fact table. Choose one dimension to collocate based on how frequently it is joined and the size of the joining rows. Designate both the dimension table's primary key and the fact table's corresponding foreign key as the DISTKEY.

upvoted 2 times

...

pidkiller

3 years, 2 months ago

The question is not clear enough. It really depends on the queries run and the size of EACH table. I would answer B assuming that when you query for a product, you might want to see all transactions of this product. All product is a dimension, it could still be a really large table to have an ALL distribution. Maybe if both tables are partitioned by the sku, performance would be better.

upvoted 2 times

...

Load full discussion...