exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 4 question 1 discussion

Actual exam question from Microsoft's DP-203
Question #: 1
Topic #: 4
[All DP-203 Questions]

You implement an enterprise data warehouse in Azure Synapse Analytics.
You have a large fact table that is 10 terabytes (TB) in size.
Incoming queries use the primary key SaleKey column to retrieve data as displayed in the following table:

You need to distribute the large fact table across multiple nodes to optimize performance of the table.
Which technology should you use?

  • A. hash distributed table with clustered index
  • B. hash distributed table with clustered Columnstore index
  • C. round robin distributed table with clustered index
  • D. round robin distributed table with clustered Columnstore index
  • E. heap table with distribution replicate
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rjile
Highly Voted 3 years, 3 months ago
correct B
upvoted 39 times
aortega
3 years, 1 month ago
For Example: CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL , [OrderDateKey] int NOT NULL , [CustomerKey] int NOT NULL , [PromotionKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [OrderQuantity] smallint NOT NULL , [UnitPrice] money NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = HASH([ProductKey]) ) ;
upvoted 7 times
...
...
[Removed]
Most Recent 4 months, 4 weeks ago
Should be A because it is said that most query are using the Primary Key (SaleKey) to access data. If you access data with your PK, it's row-oriented and not columned-oriented. So no Column Store Clustered Index but regular Clustered Index.
upvoted 2 times
...
kkk5566
1 year, 2 months ago
Selected Answer: B
is correct
upvoted 1 times
...
temmytak
1 year, 6 months ago
Selected Answer: B
Correct B
upvoted 2 times
...
Shanmahi
1 year, 10 months ago
Selected Answer: B
Hash on SaleKey distribution column using Columnstore clustered index; Why? (1) petabyte scale data (2) incoming query on SaleKey therefore, SaleKey will be used in WHERE condition and clustered columnstore index will be efficient.
upvoted 2 times
...
dmitriypo
1 year, 12 months ago
Selected Answer: B
B is correct
upvoted 2 times
...
Deeksha1234
2 years, 2 months ago
Selected Answer: B
yes, B is correct
upvoted 2 times
...
Remedios79
2 years, 4 months ago
correct
upvoted 2 times
...
LiLy91
2 years, 9 months ago
Clustered indexes may outperform clustered columnstore tables when a single row needs to be quickly retrieved. For queries where a single or very few row lookup is required to perform with extreme speed, consider a clustered index or nonclustered secondary index. The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clustered index column. To improve filter on other columns, a nonclustered index can be added to other columns. However, each index that is added to a table adds both space and processing time to loads.
upvoted 1 times
...
jv2120
2 years, 10 months ago
Clustered columnstore indexes are the most efficient way you can store your data in Azure SQL Data Warehouse. Storing your data in tables that have a clustered columnstore index are the fastest way to query your data. It will give you the greatest data compression and lower your storage costs. Hash-distributed tables work well for large fact tables in a star schema. They can have very large numbers of rows and still achieve high performance. Consider using a hash-distributed table when: The table size on disk is more than 2 GB. The table has frequent insert, update, and delete operations. ANS B
upvoted 1 times
...
SujithaVulchi
3 years, 1 month ago
A heap is a table without a clustered index. One or more nonclustered indexes can be created on tables stored as a heap. Data is stored in the heap without specifying an order. Usually data is initially stored in the order in which is the rows are inserted into the table, but the Database Engine can move data around in the heap to store the rows efficiently; so the data order cannot be predicted. To guarantee the order of rows returned from a heap, you must use the ORDER BY clause. To specify a permanent logical order for storing the rows, create a clustered index on the table, so that the table is not a heap. Correct answer: Non clustered
upvoted 2 times
...
Avinash75
3 years, 3 months ago
Incoming queries use the primary key SaleKey column to retrieve data as displayed in the following table ..doesn't this mean Salekey will be used in where clause , which makes Salekey not suitable for hashkey distribution . Choosing a distribution column that helps minimize data movement is one of the most important strategies for optimizing performance of your dedicated SQL pool: - Is not used in WHERE clauses. This could narrow the query to not run on all the distributions. with no obvious choice i feel it should be round robin with column clustered index i.e D
upvoted 1 times
Aditya0891
2 years, 4 months ago
when you don't have any good candidate for hashkey you can also go for composite key. And here the size of the table is huge and using round robin you will never obtain good performance
upvoted 1 times
...
[Removed]
3 years ago
Consider using a hash-distributed table when: The table size on disk is more than 2 GB ref:https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#choosing-a-distribution-column
upvoted 1 times
...
...
erssiws
3 years, 4 months ago
I understand that hash distribution mainly for improving the joins and group-by to reduce the data shuffling. In this case, there is no join or group-by mentioned. I think round-robin would be a better option.
upvoted 1 times
...
Yatoom
3 years, 4 months ago
If the answer is hash distributed, then what would be the key? If there is no obvious joining key, round-robin should be chosen (https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#round-robin-distributed)
upvoted 1 times
Preben
3 years, 4 months ago
It says it uses the SaleKey. Round-robin is generally not effective at these large scale tables. The 10 tb was a very important hint here.
upvoted 15 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago