exam questions

Exam DP-500 All Questions

View all questions & answers for the DP-500 exam

Exam DP-500 topic 1 question 102 discussion

Actual exam question from Microsoft's DP-500
Question #: 102
Topic #: 1
[All DP-500 Questions]

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this question, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are using an Azure Synapse Analytics serverless SQL pool to query a collection of Apache Parquet files by using automatic schema inference. The files contain more than 40 million rows of UTF-8-encoded business names, survey names, and participant counts. The database is configured to use the default collation.
The queries use OPENROWSET and infer the schema shown in the following table.

You need to recommend changes to the queries to reduce I/O reads and tempdb usage.
Solution: You recommend using OPENROWSET WITH to explicitly define the collation for businessName and surveyName as Latin1_General_100_BIN2_UTF8.
Does this meet the goal?

  • A. Yes
  • B. No
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
sheilawu
1 year, 8 months ago
Selected Answer: A
I sitck with AAAAAAA
upvoted 1 times
...
fireofsea
1 year, 8 months ago
Selected Answer: A
Make sure that you are explicilty specifying some UTF-8 collation (for example Latin1_General_100_BIN2_UTF8) for all string columns in WITH clause or set some UTF-8 collation at database level. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files
upvoted 2 times
...
fireofsea
1 year, 9 months ago
Selected Answer: A
https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/always-use-utf-8-collations-to-read-utf-8-text-in-serverless-sql/ba-p/1883633
upvoted 1 times
hoss29
1 year, 8 months ago
But according to that same link, the database has to be dropped beforehand. Last paragraph says " NOTE: If you have existing tables that used default database collation, changing default database collation would not be applied on them. You would need to drop and re-create external tables so they can pickup new default." So i think the answer is NO
upvoted 1 times
hoss29
1 year, 8 months ago
please disregard as here we are not changing the database collation. Answer should be YES. From the copilot: Reducing I/O Reads When you query Parquet files in Azure Synapse Analytics, the service can push down certain filter predicates to the storage layer. This means that only the relevant row groups are read from the Parquet files, which can significantly reduce I/O reads. However, this predicate pushdown is only possible if the collation of the column in the Parquet file matches the collation of the column in SQL Server. By explicitly setting the collation to Latin1_General_100_BIN2_UTF8, you ensure that the collations match, enabling predicate pushdown and reducing I/O reads. Reducing tempdb Usage When you query data with a different collation than the server’s default collation, SQL Server needs to perform a collation conversion. This conversion happens in memory and can increase tempdb usage. By setting the collation at the column level to match the data’s actual collation, you avoid these conversions and reduce tempdb usage.
upvoted 3 times
...
...
...
PrudenceK
1 year, 11 months ago
Selected Answer: B
Defining the collation for the columns as Latin1_General_100_BIN2_UTF8 will not directly reduce I/O reads and tempdb usage. Collation determines the sorting and comparison rules for character data and does not have a direct impact on I/O or tempdb usage. To reduce I/O reads and tempdb usage, you should consider the following approaches: Implement appropriate indexing on the columns used in the queries to improve query performance and reduce the need for extensive I/O reads. Optimize the query logic by using efficient filters and aggregations to minimize the amount of data read and processed. Consider partitioning the data based on relevant criteria, such as date or another logical partitioning key, to improve query performance and reduce I/O reads. Utilize appropriate compression techniques for the Parquet files to reduce their size, leading to reduced I/O reads.
upvoted 1 times
...
sgodd_0298
2 years, 1 month ago
Selected Answer: A
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-data-storage
upvoted 2 times
...
DarioReymago
2 years, 1 month ago
Selected Answer: A
a is correct https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files
upvoted 3 times
...
solref
2 years, 2 months ago
Selected Answer: A
The Latin1_General_100_BIN2_UTF8 collation has additional performance optimization that works only for parquet and Cosmos DB. LINK: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files Collations: The default inheritance can be overridden by explicitly stating a different collation for a character-based data type.
upvoted 4 times
...
Fer079
2 years, 3 months ago
Selected Answer: A
Data in a Parquet file is organized in row groups. Serverless SQL pool skips row groups based on the specified predicate in the WHERE clause, which reduces IO. The result is increased query performance. If we want to filter by businessName and/or by survey name then is recommended to use the Latin1_General_100_BIN2_UTF8 collation https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-serverless-sql-pool#use-proper-collation-to-utilize-predicate-pushdown-for-character-columns
upvoted 4 times
...
louisaok
2 years, 4 months ago
Selected Answer: B
B is the correct one.
upvoted 3 times
...
Saffar
2 years, 4 months ago
using "OPENROWSET WITH to explicitly define the collation for businessName and surveyName as Latin1_General_100_BIN2_UTF8", will allow you to read the string properly, but I think it has nothing to do with decreasing the I/O, so most likely it's NO
upvoted 3 times
...
louisaok
2 years, 4 months ago
Selected Answer: B
No, Solution: You recommend defining a data source and view for the Parquet files. You recommend updating the query to use the view.
upvoted 4 times
...
Maazi
2 years, 5 months ago
Selected Answer: B
You don't need to use the OPENROWSET WITH clause when reading Parquet files Source: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files
upvoted 4 times
DarioReymago
2 years, 1 month ago
Sorry Maazi, all that link that you refere say that the correct answer is A
upvoted 3 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...