Exam DP-500 topic 1 question 69 discussion

Actual exam question from Microsoft's DP-500

Question #: 69
Topic #: 1

You are creating an external table by using an Apache Spark pool in Azure Synapse Analytics. The table will contain more than 20 million rows partitioned by date. The table will be shared with the SQL engines.
You need to minimize how long it takes for a serverless SQL pool to execute a query data against the table.
In which file format should you recommend storing the table data?

A. CSV
B. Delta
C. JSON
D. Apache Parquet

Show Suggested Answer

Suggested Answer: D 🗳️

by louisaok at Jan. 6, 2023, 9:09 p.m.

Comments

Submit Cancel

SamuComqi

1 year, 10 months ago

Selected Answer: D

I took the exam a few days ago (14/8/2023) and I passed the exam with a score of 915. My answer was: Apache Parquet

upvoted 1 times

...

Albeeliu

2 years, 2 months ago

From Chatgpt: By using Apache Parquet format for the external table, you can minimize the query execution time for serverless SQL pools in Azure Synapse Analytics because: Columnar storage: Apache Parquet stores data in a columnar format, which allows for highly efficient and fast data access. This means that queries against the external table can be executed faster because only the relevant columns are read. Compression: Apache Parquet uses a highly efficient compression algorithm, which reduces the size of the data on disk. Smaller data size means less data to transfer, which results in faster query execution time. Partitioning: Apache Parquet supports partitioning, which allows you to subdivide the external table into smaller, more manageable files. When querying the table, only the relevant partitions are scanned, which makes query execution faster. Overall, by using Apache Parquet for the external table, you can significantly reduce the amount of time it takes for a serverless SQL pool to execute a query against the table, making it a more performant solution for analyzing large datasets.

upvoted 1 times

...

Hongzu13

2 years, 3 months ago

Selected Answer: D

Well, this link doesn't give the answer directly, but MS indirectly states that you should use Apache Parquet files for your SQL serverless pool. https://learn.microsoft.com/en-us/azure/synapse-analytics/get-started-analyze-sql-on-demand

upvoted 2 times

...

louisaok

2 years, 5 months ago

Selected Answer: D

D is correct

upvoted 3 times

...