Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 4 question 1 discussion

Actual exam question from Microsoft's DP-203

Question #: 1
Topic #: 4

You implement an enterprise data warehouse in Azure Synapse Analytics.
You have a large fact table that is 10 terabytes (TB) in size.
Incoming queries use the primary key SaleKey column to retrieve data as displayed in the following table:

You need to distribute the large fact table across multiple nodes to optimize performance of the table.
Which technology should you use?

A. hash distributed table with clustered index
B. hash distributed table with clustered Columnstore index
C. round robin distributed table with clustered index
D. round robin distributed table with clustered Columnstore index
E. heap table with distribution replicate

Show Suggested Answer

Suggested Answer: B 🗳️

by Yatoom at June 4, 2021, 11:42 p.m.

Comments

Submit Cancel

rjile

Highly Voted 3 years, 6 months ago

correct B

upvoted 39 times

aortega

3 years, 3 months ago

For Example: CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL , [OrderDateKey] int NOT NULL , [CustomerKey] int NOT NULL , [PromotionKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [OrderQuantity] smallint NOT NULL , [UnitPrice] money NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = HASH([ProductKey]) ) ;

upvoted 7 times

...

kkk5566

Most Recent 1 year, 4 months ago

Selected Answer: B

is correct

upvoted 1 times

...

temmytak

1 year, 9 months ago

Selected Answer: B

Correct B

upvoted 2 times

...

Shanmahi

2 years, 1 month ago

Selected Answer: B

Hash on SaleKey distribution column using Columnstore clustered index; Why? (1) petabyte scale data (2) incoming query on SaleKey therefore, SaleKey will be used in WHERE condition and clustered columnstore index will be efficient.

upvoted 2 times

...

dmitriypo

2 years, 2 months ago

Selected Answer: B

B is correct

upvoted 2 times

...

Deeksha1234

2 years, 4 months ago

Selected Answer: B

yes, B is correct

upvoted 2 times

...

Remedios79

2 years, 6 months ago

correct

upvoted 2 times

...

LiLy91

2 years, 11 months ago

Clustered indexes may outperform clustered columnstore tables when a single row needs to be quickly retrieved. For queries where a single or very few row lookup is required to perform with extreme speed, consider a clustered index or nonclustered secondary index. The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clustered index column. To improve filter on other columns, a nonclustered index can be added to other columns. However, each index that is added to a table adds both space and processing time to loads.

upvoted 1 times

...

jv2120

3 years ago

Clustered columnstore indexes are the most efficient way you can store your data in Azure SQL Data Warehouse. Storing your data in tables that have a clustered columnstore index are the fastest way to query your data. It will give you the greatest data compression and lower your storage costs. Hash-distributed tables work well for large fact tables in a star schema. They can have very large numbers of rows and still achieve high performance. Consider using a hash-distributed table when: The table size on disk is more than 2 GB. The table has frequent insert, update, and delete operations. ANS B

upvoted 1 times

...

SujithaVulchi

3 years, 3 months ago

A heap is a table without a clustered index. One or more nonclustered indexes can be created on tables stored as a heap. Data is stored in the heap without specifying an order. Usually data is initially stored in the order in which is the rows are inserted into the table, but the Database Engine can move data around in the heap to store the rows efficiently; so the data order cannot be predicted. To guarantee the order of rows returned from a heap, you must use the ORDER BY clause. To specify a permanent logical order for storing the rows, create a clustered index on the table, so that the table is not a heap. Correct answer: Non clustered

upvoted 2 times

...

Avinash75

3 years, 6 months ago

Incoming queries use the primary key SaleKey column to retrieve data as displayed in the following table ..doesn't this mean Salekey will be used in where clause , which makes Salekey not suitable for hashkey distribution . Choosing a distribution column that helps minimize data movement is one of the most important strategies for optimizing performance of your dedicated SQL pool: - Is not used in WHERE clauses. This could narrow the query to not run on all the distributions. with no obvious choice i feel it should be round robin with column clustered index i.e D

upvoted 1 times

Aditya0891

2 years, 6 months ago

when you don't have any good candidate for hashkey you can also go for composite key. And here the size of the table is huge and using round robin you will never obtain good performance

upvoted 1 times

...

erssiws

3 years, 6 months ago

I understand that hash distribution mainly for improving the joins and group-by to reduce the data shuffling. In this case, there is no join or group-by mentioned. I think round-robin would be a better option.

upvoted 1 times

...

Yatoom

3 years, 7 months ago

If the answer is hash distributed, then what would be the key? If there is no obvious joining key, round-robin should be chosen (https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#round-robin-distributed)

upvoted 1 times

Preben

3 years, 7 months ago

It says it uses the SaleKey. Round-robin is generally not effective at these large scale tables. The 10 tb was a very important hint here.

upvoted 15 times

...