exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 76 discussion

Actual exam question from Microsoft's DP-203
Question #: 76
Topic #: 1
[All DP-203 Questions]

You have an Azure Databricks workspace and an Azure Data Lake Storage Gen2 account named storage1.

New files are uploaded daily to storage1.

You need to recommend a solution that configures storage1 as a structured streaming source. The solution must meet the following requirements:

• Incrementally process new files as they are uploaded to storage1.
• Minimize implementation and maintenance effort.
• Minimize the cost of processing millions of files.
• Support schema inference and schema drift.

Which should you include in the recommendation?

  • A. COPY INTO
  • B. Azure Data Factory
  • C. Auto Loader
  • D. Apache Spark FileStreamSource
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Nikiboy
Highly Voted 2 years, 2 months ago
Auto Loader provides a Structured Streaming source called cloudFiles. Plus, it supports schema drift. Hence, Auto Loader is the correct answer. https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/
upvoted 18 times
mr_examers
2 years, 1 month ago
Auto Loader does not support Azure Data Lake Storage Gen2
upvoted 1 times
cloud_lady
2 years ago
It does. Refer this link https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/
upvoted 1 times
...
vctrhugo
1 year, 11 months ago
Auto Loader can load data files from AWS S3 (s3://), Azure Data Lake Storage Gen2 (ADLS Gen2, abfss://), Google Cloud Storage (GCS, gs://), Azure Blob Storage (wasbs://), ADLS Gen1 (adl://), and Databricks File System (DBFS, dbfs:/).
upvoted 3 times
...
...
...
samianae
Most Recent 4 months, 2 weeks ago
Selected Answer: C
Auto Loader
upvoted 1 times
...
moize
6 months, 1 week ago
Selected Answer: C
Auto Loader (option C) est la solution recommandée pour configurer storage1 comme source de streaming structurée dans Azure Databricks.
upvoted 1 times
...
EmnCours
6 months, 2 weeks ago
Selected Answer: C
Auto Loader is correc
upvoted 1 times
...
ahana1074
9 months, 1 week ago
autoloader is correct-:Incremental processing: Auto Loader can automatically detect and incrementally process new files as they are uploaded to Azure Data Lake Storage Gen2. Minimize implementation and maintenance effort: Auto Loader is designed for simplicity, requiring minimal setup and automatically handling file management. It reduces operational overhead by automating many of the tasks required to manage a streaming source. Minimize the cost of processing millions of files: Auto Loader efficiently scales to handle millions of files and minimizes costs by using a directory listing mode or a more optimized file notification mode with Azure Event Grid. Support schema inference and schema drift: Auto Loader automatically infers schema and can handle schema drift, which allows it to dynamically adapt to changes in the file structure without requiring constant updates to the processing logic.
upvoted 1 times
...
Alongi
1 year, 2 months ago
Selected Answer: C
Auto Loader is correct
upvoted 3 times
...
Homer23
1 year, 2 months ago
Reference: https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/#incremental-ingestion-using-auto-loader-with-delta-live-tables
upvoted 1 times
...
Bill_Walker
1 year, 4 months ago
Selected Answer: C
Auto Loader seems more correct. Copy Into focuses on loading from storage to a Delta table
upvoted 1 times
...
Charley92
1 year, 4 months ago
Selected Answer: D
To configure storage1 as a structured streaming source that incrementally processes new files as they are uploaded minimizes implementation and maintenance effort, minimizes the cost of processing millions of files, and supports schema inference and schema drift, you should use Apache Spark FileStreamSource
upvoted 1 times
...
ellala
1 year, 8 months ago
Bing explains the following: The best option is C. Auto Loader. Auto Loader is a feature in Azure Databricks that uses a cloudFiles data source to incrementally and efficiently process new data files as they arrive in Azure Data Lake Storage Gen2. It supports schema inference and schema evolution (drift). It also minimizes implementation and maintenance effort, as it simplifies the ETL pipeline by reducing the complexity of identifying new files for processing. Other options do not meet the requirements because: A. COPY INTO: does not incrementally process new files as they are uploaded, which is one of your requirements. B. Azure Data Factory: does not natively support schema inference and schema drift. The incremental processing of new files would need to be manually implemented, which could increase implementation and maintenance effort. D. Apache Spark FileStreamSource: requires manual setup and does not natively support schema inference or schema drift. It also may not minimize the cost of processing millions of files as efficiently as Auto Loader.
upvoted 4 times
...
kkk5566
1 year, 9 months ago
Selected Answer: C
Auto Loader
upvoted 1 times
...
Deeksha1234
1 year, 10 months ago
Selected Answer: C
C is correct
upvoted 1 times
...
vctrhugo
2 years ago
To configure Azure Data Lake Storage Gen2 account (storage1) as a structured streaming source in Azure Databricks workspace, while meeting the given requirements, you should include the following in the recommendation: C. Auto Loader Auto Loader is a feature provided by Azure Databricks that automatically discovers and processes new files as they are uploaded to a specified directory in Azure Data Lake Storage Gen2. It provides an efficient and cost-effective way to incrementally process new files without the need for manual intervention. Auto Loader also supports schema inference and schema drift, allowing you to handle changes in the file schema over time. By using Auto Loader, you can minimize implementation and maintenance effort as it takes care of monitoring the storage directory for new files and processing them in an optimized manner. It also helps to minimize the cost of processing millions of files as it leverages the efficient processing capabilities of Databricks. Therefore, the correct answer is C. Auto Loader.
upvoted 4 times
...
rocky48
2 years ago
Selected Answer: C
Auto Loader
upvoted 1 times
rocky48
1 year, 4 months ago
I recommend using Auto Loader. Here’s why: Incremental Processing: Auto Loader in Azure Databricks allows you to process new files incrementally as they are uploaded to your storage account. It efficiently identifies and processes only the new data, reducing the need to reprocess entire datasets. Low Implementation and Maintenance Effort: Auto Loader simplifies the setup process. You can configure it easily within your Databricks workspace, and it automatically handles file discovery, partitioning, and schema inference. Cost-Effective: Auto Loader optimizes resource usage by processing only the necessary data. It avoids unnecessary scans of existing files, which helps minimize costs when dealing with millions of files. Schema Inference and Schema Drift Support: Auto Loader automatically infers the schema from the data and adapts to schema changes over time (schema drift). This flexibility ensures smooth processing even when the structure of incoming files evolves. Therefore, choose C.
upvoted 1 times
...
...
nicololmen
2 years, 1 month ago
D according to ChatGPT
upvoted 1 times
...
AHUI
2 years, 2 months ago
Ans : B DF supports Schema Drift - https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-schema-drift
upvoted 1 times
frankanalysis
2 years, 1 month ago
Auto Loader is lower cost.
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...