exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 2 question 11 discussion

Actual exam question from Microsoft's DP-201
Question #: 11
Topic #: 2
[All DP-201 Questions]

HOTSPOT -
The following code segment is used to create an Azure Databricks cluster.

For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Box 1: Yes -

Box 2: No -
autotermination_minutes: Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated.
If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination.

Box 3: Yes -
References:
https://docs.databricks.com/dev-tools/api/latest/clusters.html

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rmk4ever
Highly Voted 4 years, 8 months ago
1. Yes A cluster mode of ‘High Concurrency’ is selected, unlike all the others which are ‘Standard’. This results in a worker type of Standard_DS13_v2. ref: https://adatis.co.uk/databricks-cluster-sizing/ 2. NO recommended: New Job Cluster. When you run a job on a new cluster, the job is treated as a data engineering (job) workload subject to the job workload pricing. When you run a job on an existing cluster, the job is treated as a data analytics (all-purpose) workload subject to all-purpose workload pricing. ref: https://docs.microsoft.com/en-us/azure/databricks/jobs Scheduled batch workload- Launch new cluster via job ref: https://docs.databricks.com/administration-guide/capacity-planning/cmbp.html#plan-capacity-and-control-cost 3.YES Delta Lake on Databricks allows you to configure Delta Lake based on your workload patterns. ref: https://docs.databricks.com/delta/index.html
upvoted 30 times
cadio30
4 years ago
This explanation is entirely correct. the first item is referencing 'high concurrency' and one could check this while creating an interactive cluster. second item, a new job cluster should be created for job purposes as the existing all purpose cluster has different pricing. refer to the url provided at the bottom lastly, delta lake is configurable in the mentioned cluster version Reference: https://docs.microsoft.com/en-us/azure/databricks/jobs#cluster-config-tips
upvoted 4 times
cadio30
4 years ago
btw, first item hint is when you see 'serverless' it automatically indicates the 'high concurrency' cluster mode
upvoted 6 times
...
...
...
Leonido
Highly Voted 5 years, 1 month ago
My take on it: Yes to multiple users - fits to support high concurrency since no scala support Yes to efficiency - autostop and autoscale Yes to the delta store - elastic disk (not 100% sure about that)
upvoted 17 times
knightkkd
4 years, 7 months ago
Auto termination is not configured for high concurrency clusters. so this cluster does not support high concurrency. So the answer should be No Yes No refer https://docs.databricks.com/clusters/clusters-manage.html#automatic-termination
upvoted 2 times
D_Duke
4 years, 7 months ago
Auto termination is not configured for high concurrency clusters BY DEFAULT, yet you can still enable and configure it.
upvoted 5 times
awitick
4 years, 4 months ago
exactly
upvoted 1 times
...
...
...
...
karma_wins
Most Recent 4 years, 1 month ago
it seems serverless corresponds to "high concurrency" as per this blogpost - https://databricks.com/blog/2017/06/07/databricks-serverless-next-generation-resource-management-for-apache-spark.html
upvoted 3 times
...
sdas1
4 years, 4 months ago
The answer is correct. I am able to create a High Concurrency cluster as per given json config.
upvoted 2 times
sdas1
4 years, 4 months ago
Cluster Mode - High Concurrency Databricks Runtime Version 7.4 (includes Apache Spark 3.0.1, Scala 2.12) NewThis Runtime version supports only Python 3. Autopilot Options Enable autoscaling Terminate after 120 minutes of inactivity Worker Type Standard_DS13_v2 56.0 GB Memory, 8 Cores, 2 DBU Min Workers 2 Max Workers 8 Driver Type Standard_DS13_v2 56.0 GB Memory, 8 Cores, 2 DBU
upvoted 2 times
sdas1
4 years, 4 months ago
{ "autoscale": { "min_workers": 2, "max_workers": 8 }, "cluster_name": "cluster2", "spark_version": "7.4.x-scala2.12", "spark_conf": { "spark.databricks.repl.allowedLanguages": "sql,python,r", "spark.databricks.cluster.profile": "serverless" }, "node_type_id": "Standard_DS13_v2", "driver_node_type_id": "Standard_DS13_v2", "ssh_public_keys": [], "custom_tags": { "ResourceClass": "Serverless" }, "spark_env_vars": { "PYSPARK_PYTHON": "/databricks/python3/bin/python3" }, "autotermination_minutes": 120, "enable_elastic_disk": true, "cluster_source": "UI", "init_scripts": [], "cluster_id": "0116-203628-tins636" }
upvoted 1 times
sdas1
4 years, 4 months ago
As per below link, High Concurrency clusters are configured to not terminate automatically. But while configuring High Concurrency, I am able to set the autotermination_minutes=120 https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
upvoted 2 times
...
...
...
...
zarga
4 years, 4 months ago
1. YES 2. NO (use job custer to reduce cost rather than high concurency) 3. NO (we can use Delta lake starting from spark 2.4.2 based on scala 2.12.x. In this example the cluster definition is based on scala 2.11)
upvoted 4 times
...
syu31svc
4 years, 6 months ago
allowed languages are R SQL and Python -> High concurrency cluster autoscaling is enabled as seen by min and max nodes -> minimise cost definitely no CREATE TABLE syntax -> no Delta Lake table Yes Yes No
upvoted 5 times
...
lingjun
4 years, 6 months ago
1. High Concurrency "Yes" because of following config: "spark_conf": { "spark.databricks.cluster.profile": "serverless", "spark.databricks.repl.allowedLanguages": "sql,python,r" }, 2. minimise cost "No", because there is no auto scale config as below: "autoscale": { "min_workers": 2, "max_workers": 8 },
upvoted 1 times
lingjun
4 years, 6 months ago
sorry, ignore the second point.
upvoted 1 times
...
...
Yaswant
4 years, 10 months ago
I think for part 2 of question "NO" is the right answer. Let's say we have three scheduled jobs with a difference of 180 minutes each that had to be run throughout the day. Since we have set the auto-termination to 90 minutes the cluster after executing the first schedule job remains active for 90 minutes so we'll have to pay for it. Which in turn doesn't minimize cost.
upvoted 1 times
...
passnow
4 years, 10 months ago
Data Lakes Support All Data Types A data lake holds big data from many sources in a raw, granular format. It can store structured, semi-structured, or unstructured data, which means data can be kept in a more flexible format so we can transform it when we’re ready to use . I stick with the default answer
upvoted 2 times
...
shaktiprasad88
4 years, 11 months ago
I think Answer is Yes No No The given Configuration is for Interactive Cluster -(My Sample Interactive Cluster with Delta Enabled) { "autoscale": { "min_workers": 2, "max_workers": 8 }, "cluster_name": "dev_work", "spark_version": "6.6.x-scala2.11", "spark_conf": { "spark.databricks.delta.preview.enabled": "true" }, "node_type_id": "Standard_DS3_v2", "driver_node_type_id": "Standard_DS3_v2", "ssh_public_keys": [], "custom_tags": {}, "spark_env_vars": {}, "autotermination_minutes": 120, "enable_elastic_disk": true, "cluster_source": "UI", "init_scripts": [], "cluster_id": "0529-111838-patch496" }
upvoted 3 times
brcdbrcd
4 years, 6 months ago
But it says: The Databricks cluster supports the creation of a Delta Lake table. It is a spark cluster and it "supports" if it is needed. So I would say Yes.
upvoted 1 times
...
...
dip17
4 years, 11 months ago
High Concurrency does not support Auto termination; Auto-scaling minimizes the cost. So, No, Yes, Yes
upvoted 4 times
...
alexvno
4 years, 11 months ago
First - True Optimized to run concurrent SQL, Phyton and R workloads" Doesn't support Scala. Previously known as SERVERLESS
upvoted 1 times
...
AhmedReda
4 years, 11 months ago
This link shows that standard for single user, so i think High concurrency clusters for concurrency : https://docs.microsoft.com/en-us/azure/databricks/clusters/configure Standard clusters --------------------------- Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL. 1) No 2) Yes :autoscale enabled and auto-termination was decreased from 120 default to 90 3) Yes
upvoted 6 times
essdeecee
4 years, 7 months ago
Standard_DS13_v2 is a High Concurrency Cluster Mode, if I select High Concurrency the Worker Type defaults to Standard_DS13_v2
upvoted 2 times
...
...
Abhilvs
4 years, 11 months ago
Yes - Standard_DS13_V2 is cluster mode for High concurrency No- It's an interactive cluster Yes - I'm not sure, it seems like it is default setting when SQL API is chosen.
upvoted 2 times
...
Nehuuu
5 years, 2 months ago
In part 2 of the question, I have a confusion, in the datbricks config, the auto termination is set to 90 mins, and hence there is a provision of automatically getting the cluster down and minimizing cost. Had it been 0, it would to be auto termination disabled. Any thoughtS?
upvoted 2 times
avestabrzn
5 years, 2 months ago
I think it talks about running a job on a job cluster instead of an interactive cluster. Not sure..
upvoted 3 times
...
Yuri1101
5 years, 1 month ago
I think part 2 should be yes
upvoted 4 times
Mathster
5 years ago
To minimize the cost, it shoud be set to the lower value = 10. Since it is set to 90, it means the cluster can run for nothing during the next 90 minutes after the last schedule job which is not cost-efficient so the answer "NO" is correct for this one. YES/NO/YES seams to be the correct answer.
upvoted 21 times
...
...
andreeavi
4 years, 5 months ago
High Concurrency clusters are configured to not terminate automatically. https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
upvoted 1 times
andreeavi
4 years, 5 months ago
ignore it. it's not set by default
upvoted 1 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...