Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 2 question 22 discussion

Actual exam question from Microsoft's DP-201

Question #: 22
Topic #: 2

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure Data Lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes an Azure Databricks notebook, and then inserts the data into the data warehouse.
Does this meet the goal?

A. Yes
B. No

Show Suggested Answer

Suggested Answer: B 🗳️
Use a stored procedure, not an Azure Databricks notebook to invoke the R script.
Reference:
https://docs.microsoft.com/en-US/azure/data-factory/transform-data

by Nieswurz at Aug. 17, 2020, 2 p.m.

Comments

Submit Cancel

Nieswurz

Highly Voted 4 years, 10 months ago

This should be the correct answer.

upvoted 27 times

andreeavi

4 years, 6 months ago

first step is to ingest data..

upvoted 1 times

...

maynard13x8

4 years, 3 months ago

I think notebooks are only interactive. It should be a job cluster. Any opinions?

upvoted 2 times

...

Bhagya123456

3 years, 10 months ago

Now your comment is ambiguous. Do you mean correct answer provided in that case 'NO' is answer or the Solution provided is correct and it will do the Job, in this case 'Yes' will be the answer...

upvoted 4 times

...

bakamon

Most Recent 2 years, 1 month ago

Yes, this solution meets the goal. You can use an Azure Data Factory schedule trigger to execute a pipeline that copies the data to a staging table in the data warehouse, and then uses a stored procedure to execute the R script. This will allow you to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics on a daily basis.

upvoted 1 times

...

Ssv2030

3 years, 9 months ago

The answer should be NO because: 1. we can't assume that the Azure Databricks notebook will execute/run the transform R script, it is not mentioned that Azure Databricks notebook will run the R script 2. for incremental loads in ADF, I think a tumbling trigger should be used. can someone pls confirm?

upvoted 1 times

...

MMM777

4 years, 1 month ago

Answer should be YES: ADF can trigger a Databricks notebook (not required to be user-driven): https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook

upvoted 4 times

...

cadio30

4 years, 1 month ago

The answer is Yes. R script is executed in the azure databricks notebook and once the transformation is completed then the mount the Azure Synapse to load the data. Reference: https://docs.microsoft.com/en-us/azure/databricks/scenarios/databricks-extract-load-sql-data-warehouse

upvoted 1 times

...

AJMorgan591

4 years, 9 months ago

Should use a tumbling window trigger in ADF for incremental loading. https://docs.microsoft.com/en-us/azure/data-factory/solution-template-copy-new-files-lastmodifieddate

upvoted 2 times

BungyTex

4 years, 6 months ago

Don't have to, can just use a regular schedule no problem.

upvoted 1 times

...

avix

4 years, 10 months ago

I'm surprised as I ran R in Azure Databrick

upvoted 2 times

...

Nieswurz

4 years, 10 months ago

The solution template mentioned by Bob123456 does not fit, as --- per description --- the R script is to be run when the data is still located in the data lake. After the R based transformation, the result is to be loaded to the DWH. This type of processing would need polybase for accessing the data lake, which is not mentioned here.

upvoted 3 times

apandey

4 years, 9 months ago

Databricks notebook can use mount to access data lake. Notebook is correct answer

upvoted 2 times

...

Bob123456

4 years, 10 months ago

this is incorrect https://docs.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-r-create-script?view=sql-server-ver15

upvoted 1 times

...