Exam DP-200 topic 5 question 5 discussion

Actual exam question from Microsoft's DP-200

Question #: 5
Topic #: 5

You implement an enterprise data warehouse in Azure Synapse Analytics.
You have a large fact table that is 10 terabytes (TB) in size.
Incoming queries use the primary key Sale Key column to retrieve data as displayed in the following table:

You need to distribute the large fact table across multiple nodes to optimize performance of the table.
Which technology should you use?

A. hash distributed table with clustered ColumnStore index
B. hash distributed table with clustered index
C. heap table with distribution replicate
D. round robin distributed table with clustered index
E. round robin distributed table with clustered ColumnStore index

Show Suggested Answer

Suggested Answer: A 🗳️
Hash-distributed tables improve query performance on large fact tables.
Columnstore indexes can achieve up to 100x better performance on analytics and data warehousing workloads and up to 10x better data compression than traditional rowstore indexes.
Incorrect Answers:
D, E: Round-robin tables are useful for improving loading speed.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance

by Hinzzz at June 23, 2021, 11:45 a.m.

Comments

Submit Cancel

Hinzzz

3 years, 11 months ago

Clustered indexes may outperform clustered columnstore tables when a single row needs to be quickly retrieved. For queries where a single or very few row lookup is required to perform with extreme speed, consider a clustered index or nonclustered secondary index. The answer could be B as it is sales key-based rows retrieval.

upvoted 3 times

hello_there_

3 years, 11 months ago

I agree under the assumption that the table is only used for row retrieval by primary key. It isn't explicitly said that this is the only use though. If this really is the only use of the table, it would make more sense to put it in a CosmosDB with table API. Still, I think that B is the answer they want to hear. Otherwise, why would they have added that queries will use row retrieval by primary key?

upvoted 1 times

...