Exam DP-500 All Questions

View all questions & answers for the DP-500 exam

Exam DP-500 topic 1 question 54 discussion

Actual exam question from Microsoft's DP-500

Question #: 54
Topic #: 1

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this question, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are using an Azure Synapse Analytics serverless SQL pool to query a collection of Apache Parquet files by using automatic schema inference. The files contain more than 40 million rows of UTF-8-encoded business names, survey names, and participant counts. The database is configured to use the default collation.
The queries use OPENROWSET and infer the schema shown in the following table.

You need to recommend changes to the queries to reduce I/O reads and tempdb usage.
Solution: You recommend using OPENROWSET WITH to explicitly specify the maximum length for businessName and surveyName.
Does this meet the goal?

A. Yes
B. No

Show Suggested Answer

Suggested Answer: A 🗳️

by nbagchi at Dec. 13, 2022, 4:13 p.m.

Comments

Submit Cancel

Fer079

Highly Voted 2 years, 4 months ago

Selected Answer: A

Parquet files don't contain metadata about maximum character column length. So serverless SQL pool infers it as varchar(8000). You can see an example like this question in the following link: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-serverless-sql-pool#check-inferred-data-types

upvoted 9 times

solref

2 years, 3 months ago

It is exactly what I found! Thanks for sharing :)

upvoted 1 times

...

Alborz

Most Recent 1 year, 11 months ago

Selected Answer: B

Using OPENROWSET WITH to explicitly specify the maximum length for businessName and surveyName does not meet the goal of reducing I/O reads and tempdb usage in Azure Synapse Analytics serverless SQL pool. Specifying the maximum length using OPENROWSET WITH will only enforce a length constraint on the columns but will not directly impact I/O reads or tempdb usage.

upvoted 1 times

...

Samuel77

1 year, 12 months ago

I will select B

upvoted 2 times

...

Plb2

2 years ago

Selected Answer: A

on 40M rows reducing the default varchar(8000) to a smaller size will improve I/O and tempDb.

upvoted 1 times

...

DarioReymago

2 years, 2 months ago

Selected Answer: B

by defaut inferred data types show varchar(8000) with or without WITH clause

upvoted 2 times

...

solref

2 years, 3 months ago

Selected Answer: A

Parquet files don't contain metadata about maximum character column length. So serverless SQL pool infers it as varchar(8000). You can optimize inferred data types, using WITH to specify max length https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-serverless-sql-pool#check-inferred-data-types

upvoted 2 times

solref

2 years, 3 months ago

Answer= NO . I correct myself: The Schema definition is a best practice, but it doesnt explain a reduction of IO. Use proper collation reduces the I/O Data in a Parquet file is organized in row groups. Serverless SQL pool skips row groups based on the specified predicate in the WHERE clause, which reduces IO. The result is increased query performance. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-serverless-sql-pool#check-inferred-data-types

upvoted 1 times

...

Maazi

2 years, 6 months ago

Selected Answer: B

You don't need to use OPENROWSET WITH when reading Parquet files. Ref: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files

upvoted 4 times