exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 1 question 9 discussion

Actual exam question from Microsoft's DP-201
Question #: 9
Topic #: 1
[All DP-201 Questions]

HOTSPOT -
You have an on-premises data warehouse that includes the following fact tables. Both tables have the following columns: DataKey, ProductKey, RegionKey.
There are 120 unique product keys and 65 unique region keys.

Queries that use the data warehouse take a long time to complete.
You plan to migrate the solution to use Azure Synapse Analytics. You need to ensure that the Azure-based solution optimizes query performance and minimizes processing skew.
What should you recommend? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Box 1: Hash-distributed -

Box 2: ProductKey -
ProductKey is used extensively in joins.
Hash-distributed tables improve query performance on large fact tables.

Box 3: Round-robin -

Box 4: RegionKey -
Round-robin tables are useful for improving loading speed.
Consider using the round-robin distribution for your table in the following scenarios:
✑ When getting started as a simple starting point since it is the default
✑ If there is no obvious joining key
✑ If there is not good candidate column for hash distributing the table
✑ If the table does not share a common join key with other tables
✑ If the join is less significant than other joins in the query
✑ When the table is a temporary staging table
Note: A distributed table appears as a single table, but the rows are actually stored across 60 distributions. The rows are distributed with a hash or round-robin algorithm.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
H_S
Highly Voted 4 years, 2 months ago
Table sales: **Distribution type: Hash-Distributed For 2 Reasons: the table is 600GB and we want to optimize queries **Distribution column: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute => Product key is the only possible correct choice Table Invoices: **Distribution type: Hash-Distributed The table size is more than 2GB and probably growing up. Consider using a hash-distributed table when:The table size on disk is more than 2 GB. And The table has frequent insert, update, and delete operations. **Distribution column: for sure it’s regionkey To minimize data movement, select a distribution column that:… same link
upvoted 31 times
...
cadio30
Highly Voted 4 years ago
Distribution for both should be "hash-distributed" as we are talking about fact tables while round-robin is mostly use in staging tables. As a rule of the thumb when using hash-distributed it should be applied in the columns that uses JOIN, GROUP BY, DISTINCT, OVER, and HAVING and one shouldn't apply it in WHERE and DATE columns. Sales: Hash-distributed, ProductKey Invoices: Hash-distributed, RegiongKey Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute
upvoted 9 times
...
Qrm_1972
Most Recent 3 years, 12 months ago
The correct answer is: Sales : Hash-Distributed>>>> Product Key Invoices: Hash-Distributed>>>> Region Key
upvoted 2 times
...
SrinivasR
4 years ago
i think Distribution should be : "HASH-DISTRIBUTION" as both are Fact tables and ProductKey for sales and Region Key for Invoices .
upvoted 2 times
...
NarenG1
4 years ago
I don't think there is a distribution column option for Round Robin. The distribution column is available only for Hash Partitioning. So it must be Hash Partitioning & Region Key for Invoice table.
upvoted 2 times
...
DataDani
4 years ago
As there are some different answers for table invoices. For sure hash-distributed, as the table size is more than 2 GB. Explanation for RegionKey: To minimize data movement, select a distribution column that: Is used in JOIN, GROUP BY, DISTINCT, OVER, and HAVING clauses. When two large fact tables have frequent joins, query performance improves when you distribute both tables on one of the join columns. When a table is not used in joins, consider distributing the table on a column that is frequently in the GROUP BY clause. Is not used in WHERE clauses. This could narrow the query to not run on all the distributions. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute
upvoted 2 times
...
Geo_Barros
4 years, 2 months ago
I think that the right answer for the ditribution type at the invoice table would be hash-distributed with regionkey as the distributed key as it is used for grouping.
upvoted 4 times
...
Mariekumi
4 years, 2 months ago
I would say Hash distributed and Date key for both tables because date key is used extensively in queries in both tables, region key will result in skewed partitioning as 75% of data falls in one region. Also Hash is best for both because we are optimizing query performance and not loading which Round-Robin is best suited for
upvoted 5 times
JohnCrawford
4 years, 1 month ago
From the provided link we learn that generally we should not use date values as the partitioning key. As noted by H_S the Invoices table is large enough to warrant being hash distributed as well and as noted by you, Mariekumi, RegionKey would result in hot spots/skew. I think hash distributed on product key for both tables makes the most sense.
upvoted 4 times
...
...
akram786
4 years, 2 months ago
why round robin for invoices.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...