Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 76 discussion

Actual exam question from Microsoft's DP-203

Question #: 76
Topic #: 1

You have an Azure Databricks workspace and an Azure Data Lake Storage Gen2 account named storage1.

New files are uploaded daily to storage1.

You need to recommend a solution that configures storage1 as a structured streaming source. The solution must meet the following requirements:

• Incrementally process new files as they are uploaded to storage1.
• Minimize implementation and maintenance effort.
• Minimize the cost of processing millions of files.
• Support schema inference and schema drift.

Which should you include in the recommendation?

A. COPY INTO
B. Azure Data Factory
C. Auto Loader
D. Apache Spark FileStreamSource

Show Suggested Answer

Suggested Answer: C 🗳️

by AHUI at April 4, 2023, 8:05 p.m.

Comments

Submit Cancel

Nikiboy

Highly Voted 2 years, 2 months ago

Auto Loader provides a Structured Streaming source called cloudFiles. Plus, it supports schema drift. Hence, Auto Loader is the correct answer. https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/

upvoted 18 times

mr_examers

2 years, 1 month ago

Auto Loader does not support Azure Data Lake Storage Gen2

upvoted 1 times

cloud_lady

2 years ago

It does. Refer this link https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/

upvoted 1 times

...

vctrhugo

1 year, 11 months ago

Auto Loader can load data files from AWS S3 (s3://), Azure Data Lake Storage Gen2 (ADLS Gen2, abfss://), Google Cloud Storage (GCS, gs://), Azure Blob Storage (wasbs://), ADLS Gen1 (adl://), and Databricks File System (DBFS, dbfs:/).

upvoted 3 times

...

samianae

Most Recent 4 months, 2 weeks ago

Selected Answer: C

Auto Loader

upvoted 1 times

...

moize

6 months, 1 week ago

Selected Answer: C

Auto Loader (option C) est la solution recommandée pour configurer storage1 comme source de streaming structurée dans Azure Databricks.

upvoted 1 times

...

EmnCours

6 months, 2 weeks ago

Selected Answer: C

Auto Loader is correc

upvoted 1 times

...

ahana1074

9 months, 1 week ago

autoloader is correct-:Incremental processing: Auto Loader can automatically detect and incrementally process new files as they are uploaded to Azure Data Lake Storage Gen2. Minimize implementation and maintenance effort: Auto Loader is designed for simplicity, requiring minimal setup and automatically handling file management. It reduces operational overhead by automating many of the tasks required to manage a streaming source. Minimize the cost of processing millions of files: Auto Loader efficiently scales to handle millions of files and minimizes costs by using a directory listing mode or a more optimized file notification mode with Azure Event Grid. Support schema inference and schema drift: Auto Loader automatically infers schema and can handle schema drift, which allows it to dynamically adapt to changes in the file structure without requiring constant updates to the processing logic.

upvoted 1 times

...

Alongi

1 year, 2 months ago

Selected Answer: C

Auto Loader is correct

upvoted 3 times

...

Homer23

1 year, 2 months ago

Reference: https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/#incremental-ingestion-using-auto-loader-with-delta-live-tables

upvoted 1 times

...

Bill_Walker

1 year, 4 months ago

Selected Answer: C

Auto Loader seems more correct. Copy Into focuses on loading from storage to a Delta table

upvoted 1 times

...

Charley92

1 year, 4 months ago

Selected Answer: D

To configure storage1 as a structured streaming source that incrementally processes new files as they are uploaded minimizes implementation and maintenance effort, minimizes the cost of processing millions of files, and supports schema inference and schema drift, you should use Apache Spark FileStreamSource

upvoted 1 times

...

ellala

1 year, 8 months ago

Bing explains the following: The best option is C. Auto Loader. Auto Loader is a feature in Azure Databricks that uses a cloudFiles data source to incrementally and efficiently process new data files as they arrive in Azure Data Lake Storage Gen2. It supports schema inference and schema evolution (drift). It also minimizes implementation and maintenance effort, as it simplifies the ETL pipeline by reducing the complexity of identifying new files for processing. Other options do not meet the requirements because: A. COPY INTO: does not incrementally process new files as they are uploaded, which is one of your requirements. B. Azure Data Factory: does not natively support schema inference and schema drift. The incremental processing of new files would need to be manually implemented, which could increase implementation and maintenance effort. D. Apache Spark FileStreamSource: requires manual setup and does not natively support schema inference or schema drift. It also may not minimize the cost of processing millions of files as efficiently as Auto Loader.

upvoted 4 times

...

kkk5566

1 year, 9 months ago

Selected Answer: C

Auto Loader

upvoted 1 times

...

Deeksha1234

1 year, 10 months ago

Selected Answer: C

C is correct

upvoted 1 times

...

vctrhugo

2 years ago

To configure Azure Data Lake Storage Gen2 account (storage1) as a structured streaming source in Azure Databricks workspace, while meeting the given requirements, you should include the following in the recommendation: C. Auto Loader Auto Loader is a feature provided by Azure Databricks that automatically discovers and processes new files as they are uploaded to a specified directory in Azure Data Lake Storage Gen2. It provides an efficient and cost-effective way to incrementally process new files without the need for manual intervention. Auto Loader also supports schema inference and schema drift, allowing you to handle changes in the file schema over time. By using Auto Loader, you can minimize implementation and maintenance effort as it takes care of monitoring the storage directory for new files and processing them in an optimized manner. It also helps to minimize the cost of processing millions of files as it leverages the efficient processing capabilities of Databricks. Therefore, the correct answer is C. Auto Loader.

upvoted 4 times

...

rocky48

2 years ago

Selected Answer: C

Auto Loader

upvoted 1 times

rocky48

1 year, 4 months ago

I recommend using Auto Loader. Here’s why: Incremental Processing: Auto Loader in Azure Databricks allows you to process new files incrementally as they are uploaded to your storage account. It efficiently identifies and processes only the new data, reducing the need to reprocess entire datasets. Low Implementation and Maintenance Effort: Auto Loader simplifies the setup process. You can configure it easily within your Databricks workspace, and it automatically handles file discovery, partitioning, and schema inference. Cost-Effective: Auto Loader optimizes resource usage by processing only the necessary data. It avoids unnecessary scans of existing files, which helps minimize costs when dealing with millions of files. Schema Inference and Schema Drift Support: Auto Loader automatically infers the schema from the data and adapts to schema changes over time (schema drift). This flexibility ensures smooth processing even when the structure of incoming files evolves. Therefore, choose C.

upvoted 1 times

...

nicololmen

2 years, 1 month ago

D according to ChatGPT

upvoted 1 times

...

AHUI

2 years, 2 months ago

Ans : B DF supports Schema Drift - https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-schema-drift

upvoted 1 times

frankanalysis

2 years, 1 month ago

Auto Loader is lower cost.

upvoted 2 times

...