exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 2 question 28 discussion

Actual exam question from Microsoft's DP-201
Question #: 28
Topic #: 2
[All DP-201 Questions]

You are designing an Azure Data Factory pipeline for processing data. The pipeline will process data that is stored in general-purpose standard Azure storage.
You need to ensure that the compute environment is created on-demand and removed when the process is completed.
Which type of activity should you recommend?

  • A. Databricks Python activity
  • B. Data Lake Analytics U-SQL activity
  • C. HDInsight Pig activity
  • D. Databricks Jar activity
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️
The HDInsight Pig activity in a Data Factory pipeline executes Pig queries on your own or on-demand HDInsight cluster.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-hadoop-pig

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
GabiN
Highly Voted 5 years, 4 months ago
According to Microsoft documentation: https://docs.microsoft.com/en-us/azure/data-factory/transform-data only 4 external transformations can be executed on-demand: HDInsight MapReduce Activity, HDInsight Hive Activity, HDInsight Pig Activity and HDInsight Streaming Activity. On-demand means that the computing environment is automatically created by the Data Factory service before a job is submitted to process data and removed when the job is completed. Therefore, the correct answer is C.
upvoted 53 times
...
methodidacte
Highly Voted 5 years, 5 months ago
I agree with the solution C : "With on-demand HDInsight linked service, a HDInsight cluster is created every time a slice needs to be processed unless there is an existing live cluster (timeToLive) and is deleted when the processing is done." But why are the others false ?
upvoted 7 times
...
H_S
Most Recent 4 years, 3 months ago
NOT IN THE DP-201 ANY MORE
upvoted 6 times
...
Deepu1987
4 years, 4 months ago
I would go with HDInsight Pig activity - rather than option A as per the given condition in the question where we're using ADLS n data bricks is ideally used during ADLS Gen2
upvoted 1 times
...
syu31svc
4 years, 6 months ago
I would agree with the answer From https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-compute-linked-services#:~:text=When%20the%20job%20is%20finished,cluster%20management%2C%20and%20bootstrapping%20actions.: "Data Factory automatically creates the compute environment before a job is submitted for processing data. When the job is finished, Data Factory removes the compute environment." "The Azure Storage linked service to be used by the on-demand cluster for storing and processing data. The HDInsight cluster is created in the same region as this storage account. Currently, you can't create an on-demand HDInsight cluster that uses Azure Data Lake Store as the storage. If you want to store the result data from HDInsight processing in Data Lake Store, use Copy Activity to copy the data from Blob storage to Data Lake Store."
upvoted 1 times
...
GraceCyborg
4 years, 7 months ago
HDinsight is not in dp201 anymore
upvoted 2 times
...
Abhilvs
5 years ago
Azure Databricks also supports on-demand. when running from Az Datafactory, Databricks cluster gets created as an Automated cluster and destroyed after completion. The question is ambiguous.
upvoted 2 times
...
Runi
5 years ago
The HDInsight Pig activity in a Data Factory pipeline executes Pig queries on your own or on-demand Windows/Linux-based HDInsight cluster. See Pig activity article for details about this activity. Same as Mapreduce , streaming and hive activity - mentioned explicitly "on your own or on-demand" and based on on demand "On-Demand: In this case, the computing environment is fully managed by Data Factory. It is automatically created by the Data Factory service before a job is submitted to process data and removed when the job is completed. You can configure and control granular settings of the on-demand compute environment for job execution, cluster management, and bootstrapping actions." However, python or jar activities doesn't do any on-demand process. So answer is C.
upvoted 1 times
...
Leonido
5 years, 2 months ago
It's the strange question. Every one of them could answer the demand.
upvoted 3 times
azurearch
5 years, 1 month ago
The Azure Databricks Python Activity in a Data Factory pipeline runs a Python file in your Azure Databricks cluster. This article builds on the data transformation activities article, which presents a general overview of data transformation and the supported transformation activities. Azure Databricks is a managed platform for running Apache Spark.
upvoted 1 times
...
...
Narender_Bhadrecha
5 years, 4 months ago
A is also correct answer.
upvoted 2 times
...
mustaphaa
5 years, 5 months ago
A and D are correct too, u can use automatic created cluster option in linked services
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...