Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 19 discussion

Actual exam question from Microsoft's DP-203

Question #: 19
Topic #: 1

HOTSPOT -
You have a SQL pool in Azure Synapse.
You plan to load data from Azure Blob storage to a staging table. Approximately 1 million rows of data will be loaded daily. The table will be truncated before each daily load.
You need to create the staging table. The solution must minimize how long it takes to load the data to the staging table.
How should you configure the table? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer

Suggested Answer:

Box 1: Hash -
Hash-distributed tables improve query performance on large fact tables. They can have very large numbers of rows and still achieve high performance.
Incorrect Answers:
Round-robin tables are useful for improving loading speed.

Box 2: Clustered columnstore -
When creating partitions on clustered columnstore tables, it is important to consider how many rows belong to each partition. For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed.

Box 3: Date -
Table partitions enable you to divide your data into smaller groups of data. In most cases, table partitions are created on a date column.
Partition switching can be used to quickly remove or replace a section of a table.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

by Blueko at Aug. 23, 2021, 12:26 p.m.

Comments

Submit Cancel

A1000

Highly Voted 3 years, 11 months ago

Round-Robin Heap None

upvoted 428 times

gssd4scoder

3 years, 9 months ago

Agree 100%. All in paragraphs under this: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-overview.

upvoted 10 times

DrTaz

3 years, 7 months ago

Also agree 100%

upvoted 4 times

...

roopansh.gupta2

11 months, 2 weeks ago

The ET engineer who answered probably thought we have to select the options which are outrageously foolish

upvoted 7 times

...

Narasimhap

3 years, 5 months ago

Round- Robin Heap None. No brainer for this question.

upvoted 28 times

...

Homer23

1 year, 4 months ago

Agree - Round Robin, Heap and None are the correct options. The solution doesn't make any sense for a staging table to truncate every day without any JOINs.Don't know why it says Hash/Columnstored/Date.

upvoted 2 times

...

Load full discussion...

...

laszek

Highly Voted 3 years, 11 months ago

Round-robin - this is the simplest distribution model, not great for querying but fast to process Heap - no brainer when creating staging tables No partitions - this is a staging table, why add effort to partition, when truncated daily?

upvoted 68 times

berserksap

3 years, 9 months ago

Had doubts regarding why there is no need for a partition. While what you suggested is true won't it be better if there is a date partition to truncate the table ?

upvoted 2 times

andy_g

3 years, 5 months ago

There is no filter on a truncate statement so no benefit in having a partition

upvoted 4 times

...

Vardhan_Brahmanapally

3 years, 9 months ago

Can you explain me why should we use heap?

upvoted 1 times

DrTaz

3 years, 7 months ago

The term heap basically refers to a table without a clustered index. Adding a clustered index to a temp table makes absolutely no sense and is a waste of compute resources for a table that would be entirely truncated daily. no clustered index = heap.

upvoted 13 times

SQLDev0000

3 years, 5 months ago

DrTaz is right, in addition, when you populate an indexed table, you are also writing to the index, so this adds an additional overhead in the write process

upvoted 3 times

...

HaliBrickclay

Most Recent 10 months, 1 week ago

as per Microsoft document Load to a staging table To achieve the fastest loading speed for moving data into a data warehouse table, load data into a staging table. Define the staging table as a heap and use round-robin for the distribution option. Consider that loading is usually a two-step process in which you first load to a staging table and then insert the data into a production data warehouse table. If the production table uses a hash distribution, the total time to load and insert might be faster if you define the staging table with the hash distribution. Loading to the staging table takes longer, but the second step of inserting the rows to the production table does not incur data movement across the distributions.

upvoted 5 times

VeroDon

3 years, 7 months ago

It doesn't mention the prd table. Only the staging. So, round Robin/Heap is the answer, correct? tricky questions. :)

upvoted 2 times

...

Sasha_in_San_Francisco

10 months, 1 week ago

Answer: Round-Robin (1), Heap (2), None (3). Within this doc: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-overview #1. Search for “Use round-robin for the staging table.” #2. Search for: “A heap table can be especially useful for loading data, such as a staging table,…” Within this doc: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition?context=/azure/synapse-analytics/context/context #3. Partitioning by date is useful when stage destination has data because you can hide the inserting data’s new partition (to keep users from hitting it), complete the load and then unhide the new partition. However, in this question it states, “the table will be truncated before each daily load”, so, it appears it’s a true Staging table and there are no users with access, no existing data, and I see no reason to have a Date partition. To me, such a partition would do nothing but slow the load.

upvoted 17 times

...

jhargett1

10 months, 1 week ago

Since it's the staging table, the main focus should be minimizing the load time, as noted in the question. Heap table does not have any index and it's the fastest option for loading large amounts of data. Using a round-robin distribution will help to evenly distribute the data across all the distributions, further reducing the load time. As the data is truncated before each load, partitioning is not necessary, so it is best to choose None. Round-robin Heap None

upvoted 1 times

...

akk_1289

10 months, 1 week ago

Distribution: Round Robin Indexing: Clustered Columnstore Partitioning: Date The recommended configuration for a staging table that will be loaded daily with approximately 1 million rows of data and truncated before each load is to use a round robin distribution, a clustered columnstore index, and date-based partitioning. Round robin distribution will evenly distribute the data across nodes, reducing the load time. Clustered columnstore index provides efficient compression and supports fast bulk load operations. Date-based partitioning will allow for easy archiving and maintenance of the table.

upvoted 2 times

...

74gjd_37

10 months, 1 week ago

Round-Robing, Heap, None. The question is to configure the staging table. According to the conditions, "The solution must minimize how long it takes to load the data to the staging table." Therefore, loading time is the most essential condition here. According to Microsoft documentation at https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-overview Distribution: A round-robin table distributes table rows evenly across all distributions. The rows are distributed randomly. Loading data into a round-robin table is fast. But, queries can require more data movement than the other distribution methods. Indexing: A heap table can be especially useful for loading transient data, such as a staging table, which is transformed into a final table. Partitioning: None. Since the table is truncated before each daily load, we can not benefit of partitioning to drop date ranges.

upvoted 6 times

...

MBRSDG

10 months, 1 week ago

1 --> REPLICATED Since table has always 1mln rows (it is a small staging table, loaded in truncate-insert everyday!) Also Round-Robin is wrong, since 1mln/60nodes = almost 17k rows per compute node, which is a unrecommended situation. You haven't data enough to choose round-robin. 2 --> HEAP Why do you have to choose clustered columnstore index? This is not a big table. 3 --> None Since you haven't data enough to create a efficient partition. It's more convenient to have only one partition in this case.

upvoted 3 times

...

iceberge

1 year ago

Copilot Sent by Copilot: To optimize the loading process for your staging table in Azure Synapse, here are the recommended configurations: Distribution: Use Round-robin distribution. This method evenly distributes the data across all distributions, which helps in achieving balanced data loading and minimizes data skew. Indexing: Use Heap. Since the table will be truncated before each load, using a heap (a table without a clustered index) will speed up the data loading process as it avoids the overhead of maintaining indexes during the load. Partitioning: Do not partition the staging table. Partitioning can add overhead to the data loading process. Since the table is truncated daily, partitioning is not necessary and can be avoided to keep the load process efficient. These configurations will help minimize the time it takes to load data into the staging table.

upvoted 2 times

...

RakshithaReddy

1 year, 2 months ago

RR Heap Date

upvoted 1 times

...

Dusica

1 year, 3 months ago

Round robin Heap None

upvoted 2 times

...

Alongi

1 year, 3 months ago

RR Heap None

upvoted 1 times

...

__Tom

1 year, 8 months ago

Round-Robin Heap None

upvoted 5 times

...

kkk5566

1 year, 11 months ago

Round-Robin Heap None

upvoted 1 times

...

kkk5566

1 year, 11 months ago

Round-Robin Heap None

upvoted 1 times

...

janaki

2 years, 2 months ago

Never ever use date partioning with hash distribution. The correct answer is: Round robin, Heap and None

upvoted 2 times

...

rocky48

2 years, 3 months ago

Round-robin - this is the simplest distribution model, not great for querying but fast to process Heap - no brainer when creating staging tables No partitions - this is a staging table

upvoted 1 times

...

Load full discussion...