Exam DP-500 All Questions

View all questions & answers for the DP-500 exam

Exam DP-500 topic 1 question 102 discussion

Actual exam question from Microsoft's DP-500

Question #: 102
Topic #: 1

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this question, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are using an Azure Synapse Analytics serverless SQL pool to query a collection of Apache Parquet files by using automatic schema inference. The files contain more than 40 million rows of UTF-8-encoded business names, survey names, and participant counts. The database is configured to use the default collation.
The queries use OPENROWSET and infer the schema shown in the following table.

You need to recommend changes to the queries to reduce I/O reads and tempdb usage.
Solution: You recommend using OPENROWSET WITH to explicitly define the collation for businessName and surveyName as Latin1_General_100_BIN2_UTF8.
Does this meet the goal?

A. Yes
B. No

Show Suggested Answer

Suggested Answer: A 🗳️

by Maazi at Dec. 30, 2022, 8:10 p.m.

Comments

Submit Cancel

sheilawu

1 year, 8 months ago

Selected Answer: A

I sitck with AAAAAAA

upvoted 1 times

...

fireofsea

1 year, 8 months ago

Selected Answer: A

Make sure that you are explicilty specifying some UTF-8 collation (for example Latin1_General_100_BIN2_UTF8) for all string columns in WITH clause or set some UTF-8 collation at database level. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files

upvoted 2 times

...

fireofsea

1 year, 9 months ago

Selected Answer: A

https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/always-use-utf-8-collations-to-read-utf-8-text-in-serverless-sql/ba-p/1883633

upvoted 1 times

hoss29

1 year, 8 months ago

But according to that same link, the database has to be dropped beforehand. Last paragraph says " NOTE: If you have existing tables that used default database collation, changing default database collation would not be applied on them. You would need to drop and re-create external tables so they can pickup new default." So i think the answer is NO

upvoted 1 times

hoss29

1 year, 8 months ago

please disregard as here we are not changing the database collation. Answer should be YES. From the copilot: Reducing I/O Reads When you query Parquet files in Azure Synapse Analytics, the service can push down certain filter predicates to the storage layer. This means that only the relevant row groups are read from the Parquet files, which can significantly reduce I/O reads. However, this predicate pushdown is only possible if the collation of the column in the Parquet file matches the collation of the column in SQL Server. By explicitly setting the collation to Latin1_General_100_BIN2_UTF8, you ensure that the collations match, enabling predicate pushdown and reducing I/O reads. Reducing tempdb Usage When you query data with a different collation than the server’s default collation, SQL Server needs to perform a collation conversion. This conversion happens in memory and can increase tempdb usage. By setting the collation at the column level to match the data’s actual collation, you avoid these conversions and reduce tempdb usage.

upvoted 3 times

...

PrudenceK

1 year, 11 months ago

Selected Answer: B

Defining the collation for the columns as Latin1_General_100_BIN2_UTF8 will not directly reduce I/O reads and tempdb usage. Collation determines the sorting and comparison rules for character data and does not have a direct impact on I/O or tempdb usage. To reduce I/O reads and tempdb usage, you should consider the following approaches: Implement appropriate indexing on the columns used in the queries to improve query performance and reduce the need for extensive I/O reads. Optimize the query logic by using efficient filters and aggregations to minimize the amount of data read and processed. Consider partitioning the data based on relevant criteria, such as date or another logical partitioning key, to improve query performance and reduce I/O reads. Utilize appropriate compression techniques for the Parquet files to reduce their size, leading to reduced I/O reads.

upvoted 1 times

...

sgodd_0298

2 years, 1 month ago

Selected Answer: A

https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-data-storage

upvoted 2 times

...

DarioReymago

2 years, 1 month ago

Selected Answer: A

a is correct https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files

upvoted 3 times

...

solref

2 years, 2 months ago

Selected Answer: A

The Latin1_General_100_BIN2_UTF8 collation has additional performance optimization that works only for parquet and Cosmos DB. LINK: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files Collations: The default inheritance can be overridden by explicitly stating a different collation for a character-based data type.

upvoted 4 times

...

Fer079

2 years, 3 months ago

Selected Answer: A

Data in a Parquet file is organized in row groups. Serverless SQL pool skips row groups based on the specified predicate in the WHERE clause, which reduces IO. The result is increased query performance. If we want to filter by businessName and/or by survey name then is recommended to use the Latin1_General_100_BIN2_UTF8 collation https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-serverless-sql-pool#use-proper-collation-to-utilize-predicate-pushdown-for-character-columns

upvoted 4 times

...

louisaok

2 years, 4 months ago

Selected Answer: B

B is the correct one.

upvoted 3 times

...

Saffar

2 years, 4 months ago

using "OPENROWSET WITH to explicitly define the collation for businessName and surveyName as Latin1_General_100_BIN2_UTF8", will allow you to read the string properly, but I think it has nothing to do with decreasing the I/O, so most likely it's NO

upvoted 3 times

...

louisaok

2 years, 4 months ago

Selected Answer: B

No, Solution: You recommend defining a data source and view for the Parquet files. You recommend updating the query to use the view.

upvoted 4 times

...

Maazi

2 years, 5 months ago

Selected Answer: B

You don't need to use the OPENROWSET WITH clause when reading Parquet files Source: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files

upvoted 4 times

DarioReymago

2 years, 1 month ago

Sorry Maazi, all that link that you refere say that the correct answer is A

upvoted 3 times

...