exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 116 discussion

Actual exam question from Microsoft's DP-203
Question #: 116
Topic #: 2
[All DP-203 Questions]

You have an Azure Data Lake Storage Gen2 account named account1 and an Azure event hub named Hub1. Data is written to account1 by using Event Hubs Capture.

You plan to query account by using an Apache Spark pool in Azure Synapse Analytics.

You need to create a notebook and ingest the data from account1. The solution must meet the following requirements:

• Retrieve multiple rows of records in their entirety.
• Minimize query execution time.
• Minimize data processing.

Which data format should you use?

  • A. Parquet -
    O. Avro
  • C. ORC
  • D. JSON
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
jongert
Highly Voted 1 year, 4 months ago
Answer is B, here is why: Avro and Parquet are both binary compressed file formats which makes them preferred over CSV which stores the data as strings (much larger files). Now the difference between Parquet and Avro is the format, as Parquet is column based while Avro provides row based store. Since the requirement is to retrieve rows in their entirety, it is better to use Avro. Scenarios where we only retrieve a subset of columns for analysis would favour the use of Parquet.
upvoted 8 times
...
Pey1nkh
Most Recent 2 months, 1 week ago
Selected Answer: A
Avro is row-based, meaning entire rows must be read even if only five columns are needed. This increases query time and reduces efficiency.
upvoted 1 times
...
JyotiVerma
1 year, 1 month ago
Avro, because its a row oriented storage offer slight advantage over Parquet which is columnar storage.
upvoted 2 times
...
Alongi
1 year, 2 months ago
Selected Answer: A
AVRO, because parquet is better for columnar mode
upvoted 1 times
...
be8a152
1 year, 3 months ago
AVRO ( just because we want to retrieve the data in its entirety )
upvoted 3 times
mghf61
1 year, 1 month ago
Avro is a row-based format and can be a good choice when you need to read or write individual records in a stream. However, for analytical queries that often involve reading multiple rows at once, columnar formats like Parquet are typically more efficient.
upvoted 2 times
...
...
Azure_2023
1 year, 3 months ago
Parquet is the clear winner
upvoted 2 times
Sr18
10 months, 2 weeks ago
Parquet will be performing always better considering the fact of read operation. Avro are good for write performance. But parquets are best for analytical and read operations
upvoted 1 times
...
...
matiandal
1 year, 6 months ago
Parquet showed either similar or better results on every test [than Avro]. The query-performance differences on the larger datasets in Parquet’s favor are partly due to the compression results; when querying the wide dataset, Spark had to read 3.5x less data for Parquet than Avro. Avro did not perform well when processing the entire dataset, as suspected." R: https://blog.cloudera.com/benchmarking-apache-parquet-the-allstate-experience/
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago