exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 34 discussion

Actual exam question from Microsoft's DP-203
Question #: 34
Topic #: 2
[All DP-203 Questions]

You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day.
You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times.
What should you include in the solution?

  • A. Partition by DateTime fields.
  • B. Sink to Azure Queue storage.
  • C. Include a watermark column.
  • D. Use a JSON format for physical data storage.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
bc5468521
Highly Voted 3 years, 11 months ago
The ABS-AQS source is deprecated. For new streams, we recommend using Auto Loader instead.
upvoted 29 times
...
manquak
Highly Voted 3 years, 8 months ago
Why not partition by date? What does the auto loader have to do with streaming jobs?
upvoted 17 times
...
20b1837
Most Recent 2 months ago
Selected Answer: C
Watermark. For those saying it can't be watermark please consider that watermark has different meaning based on the context. I.e. different concept when used in a streaming or loading context. Please see the below link for why watermark is used and how it is used for incremental loading. https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview
upvoted 1 times
...
Pey1nkh
2 months, 3 weeks ago
Selected Answer: A
Partitioning by DateTime is the most effective approach to minimize storage costs and speed up incremental load times.
upvoted 2 times
...
de_examtopics
5 months, 2 weeks ago
Selected Answer: A
B. Sink to Azure Queue storage: Queue storage is primarily for messaging and not ideal for storing large volumes of streaming data efficiently. C. Include a watermark column: While a watermark column is useful for processing streaming data, it does not directly address storage costs or load times. D. Use a JSON format for physical data storage: JSON can be easy to work with, but it tends to use more storage space compared to more efficient formats like Parquet, which can negatively impact storage costs.
upvoted 3 times
...
a85becd
8 months, 2 weeks ago
Selected Answer: A
Partitioning by DateTime fields helps in organizing the data efficiently, which can significantly reduce the time required for incremental loads. It allows you to quickly access and process only the relevant partitions, rather than scanning the entire dataset1. Including a watermark column (Option C) is also important for managing late-arriving data and ensuring that only the most recent data is processed. However, it doesn’t directly address storage costs or incremental load times2. Sinking to Azure Queue storage (Option B) is not suitable for this scenario as it is more appropriate for message queuing rather than persistent storage for large volumes of data3. Using a JSON format for physical data storage (Option D) is not recommended because JSON is not optimized for storage efficiency or query performance. Instead, using a columnar storage format like Parquet or Delta Lake would be more efficient4.
upvoted 2 times
...
Alongi
1 year ago
Selected Answer: C
A watermark column is essential for implementing event time-based processing in streaming data scenarios. It helps track the progress of event ingestion and ensures that only the latest data is processed, thereby enabling efficient incremental loading.
upvoted 1 times
...
Alongi
1 year, 1 month ago
Selected Answer: C
Watermark column could reduce the storage
upvoted 2 times
...
Azure_2023
1 year, 3 months ago
Selected Answer: A
Well, Azure Queue Storage is a service for storing large numbers of messages. You access messages from anywhere in the world via authenticated calls using HTTP or HTTPS. A queue message can be up to 64 KB in size. A queue may contain millions of messages, up to the total capacity limit of a storage account. Queues are commonly used to create a backlog of work to process asynchronously, like in the Web-Queue-Worker architectural style. I believe the correct answer is A.
upvoted 2 times
...
j888
1 year, 3 months ago
Incremental key and time stamp matching watermark behaviour https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview
upvoted 1 times
j888
1 year, 2 months ago
A. more likely a better answer. A. Partition by DateTime fields: Reason: Partitioning the table by date-related fields (e.g., year, month, day) allows efficient filtering during incremental load jobs. Queries can easily scan only relevant partitions for new data, significantly reducing processing time and associated costs. C. Include a watermark column: Reason: A watermark column helps track the progress of data processing, allowing incremental jobs to focus on data newer than the last processed watermark. This ensures efficient updates without reprocessing already loaded data.
upvoted 3 times
...
...
dakku987
1 year, 4 months ago
Selected Answer: A
chat gpt o design an efficient Azure Databricks table for ingesting an average of 20 million streaming events per day while minimizing storage costs and incremental load times, you should consider the following: A. Partition by DateTime fields. Explanation: Partitioning by DateTime fields is a common practice for time-series data in Azure
upvoted 1 times
...
kkk5566
1 year, 8 months ago
Selected Answer: B
should be B
upvoted 2 times
...
akhil5432
1 year, 9 months ago
Selected Answer: B
option B
upvoted 1 times
...
vctrhugo
1 year, 10 months ago
Sinking to Azure Queue storage is not necessary for persisting the events in the Azure Databricks table. Azure Queue storage is typically used for decoupling and asynchronous messaging scenarios and may not directly contribute to minimizing storage costs or incremental load times for the Databricks table.
upvoted 1 times
...
auwia
1 year, 11 months ago
Selected Answer: B
Probably it is B: Partition by date&time is not the best, immagine events with each single partition because of (day, hour, minute, second) => the requirement is clear, minimiuze the space, etc.. You use Watermark when you need to reduce the amount of state data to improve latency during a long-running steaming operation. JSON I would exclude because how it is formulated. My answer is B, even if it's deprecated, it's clear that this question is an old one, but looking at the commnents, we can still get in the exam.
upvoted 3 times
...
dksks
2 years ago
Selected Answer: A
A. Partition by DateTime fields: Partitioning the table on frequently used columns such as DateTime fields can improve query performance and reduce incremental load times. Partitioning by DateTime can help to reduce the amount of data scanned during query execution and facilitate incremental loading.
upvoted 2 times
...
hiyoww
2 years, 1 month ago
is the question outdated?
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago