exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 2 question 10 discussion

Actual exam question from Microsoft's DP-201
Question #: 10
Topic #: 2
[All DP-201 Questions]

You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and PolyBase in Azure Synapse Analytics.
You need to recommend a Stream Analytics data output format to ensure that the queries from Databricks and PolyBase against the files encounter the fewest possible errors. The solution must ensure that the files can be queried quickly and that the data type information is retained.
What should you recommend?

  • A. Avro
  • B. CSV
  • C. Parquet
  • D. JSON
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️
The Avro format is great for data and message preservation.
Avro schema with its support for evolution is essential for making the data robust for streaming architectures like Kafka, and with the metadata that schema provides, you can reason on the data. Having a schema provides robustness in providing meta-data about the data stored in Avro records which are self- documenting the data.
References:
http://cloudurable.com/blog/avro/index.html

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
felmasri
Highly Voted 4 years, 2 months ago
I think this Answer is wrong since polybase does not support Avro. I will pick Parquet
upvoted 52 times
...
jms309
Highly Voted 4 years, 2 months ago
I understand that Databricks and Polybase will consume the data independently ... So, based on that premise the selected output format from Synapse Stream Analytics should be a format compatible with both. Since, we need the file format to be a distributed file format for speed up the queries, the only possible solutions are AVRO and Parquet. As, AVRO is no a valid solution as Polybase doesn't support this format, the only possible answer is PARQUET
upvoted 15 times
...
massnonn
Most Recent 3 years, 6 months ago
for me the correct answer is parquet
upvoted 1 times
...
dumpi
4 years ago
Parquet is correct answer I verify
upvoted 3 times
...
KpKo
4 years ago
Agreed with Parquet
upvoted 2 times
...
cadio30
4 years ago
Both services uses CSV and parquet as input files though parquet is the candidate for this requirement as it is the recommended file format for azure databricks and is also supported by polybase
upvoted 2 times
...
davita8
4 years, 1 month ago
C. Parquet
upvoted 3 times
...
maciejt
4 years, 2 months ago
JSON and CSV don't define the types strongly and we need to preserve the data types, so those 2 are exuded. Parquet is better optimized for read, avro is for write and requirement is to make queries fast, so parquet. https://www.datanami.com/2018/05/16/big-data-file-formats-demystified/
upvoted 7 times
...
Nik71
4 years, 2 months ago
its Parquet file format
upvoted 2 times
...
al9887655
4 years, 2 months ago
Polybase support requirement eliminates Avro. Not sure what the right answer is.
upvoted 1 times
...
H_S
4 years, 2 months ago
avro is not supported by polybase, but why not CSV
upvoted 1 times
H_S
4 years, 2 months ago
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs it's PARKET
upvoted 2 times
...
...
kz_data
4 years, 2 months ago
I think Parquet is the right Answer
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...