Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 2 question 11 discussion

Actual exam question from Microsoft's DP-201

Question #: 11
Topic #: 2

HOTSPOT -
The following code segment is used to create an Azure Databricks cluster.

For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer

Suggested Answer:

Box 1: Yes -

Box 2: No -
autotermination_minutes: Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated.
If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination.

Box 3: Yes -
References:
https://docs.databricks.com/dev-tools/api/latest/clusters.html

by Nehuuu at March 17, 2020, 5:40 p.m.

Comments

Submit Cancel

rmk4ever

Highly Voted 4 years, 10 months ago

1. Yes A cluster mode of ‘High Concurrency’ is selected, unlike all the others which are ‘Standard’. This results in a worker type of Standard_DS13_v2. ref: https://adatis.co.uk/databricks-cluster-sizing/ 2. NO recommended: New Job Cluster. When you run a job on a new cluster, the job is treated as a data engineering (job) workload subject to the job workload pricing. When you run a job on an existing cluster, the job is treated as a data analytics (all-purpose) workload subject to all-purpose workload pricing. ref: https://docs.microsoft.com/en-us/azure/databricks/jobs Scheduled batch workload- Launch new cluster via job ref: https://docs.databricks.com/administration-guide/capacity-planning/cmbp.html#plan-capacity-and-control-cost 3.YES Delta Lake on Databricks allows you to configure Delta Lake based on your workload patterns. ref: https://docs.databricks.com/delta/index.html

upvoted 30 times

cadio30

4 years, 2 months ago

This explanation is entirely correct. the first item is referencing 'high concurrency' and one could check this while creating an interactive cluster. second item, a new job cluster should be created for job purposes as the existing all purpose cluster has different pricing. refer to the url provided at the bottom lastly, delta lake is configurable in the mentioned cluster version Reference: https://docs.microsoft.com/en-us/azure/databricks/jobs#cluster-config-tips

upvoted 4 times

cadio30

4 years, 2 months ago

btw, first item hint is when you see 'serverless' it automatically indicates the 'high concurrency' cluster mode

upvoted 6 times

...

Leonido

Highly Voted 5 years, 3 months ago

My take on it: Yes to multiple users - fits to support high concurrency since no scala support Yes to efficiency - autostop and autoscale Yes to the delta store - elastic disk (not 100% sure about that)

upvoted 17 times

knightkkd

4 years, 9 months ago

Auto termination is not configured for high concurrency clusters. so this cluster does not support high concurrency. So the answer should be No Yes No refer https://docs.databricks.com/clusters/clusters-manage.html#automatic-termination

upvoted 2 times

D_Duke

4 years, 9 months ago

Auto termination is not configured for high concurrency clusters BY DEFAULT, yet you can still enable and configure it.

upvoted 5 times

awitick

4 years, 6 months ago

exactly

upvoted 1 times

...

karma_wins

Most Recent 4 years, 3 months ago

it seems serverless corresponds to "high concurrency" as per this blogpost - https://databricks.com/blog/2017/06/07/databricks-serverless-next-generation-resource-management-for-apache-spark.html

upvoted 3 times

...

sdas1

4 years, 6 months ago

The answer is correct. I am able to create a High Concurrency cluster as per given json config.

upvoted 2 times

sdas1

4 years, 6 months ago

Cluster Mode - High Concurrency Databricks Runtime Version 7.4 (includes Apache Spark 3.0.1, Scala 2.12) NewThis Runtime version supports only Python 3. Autopilot Options Enable autoscaling Terminate after 120 minutes of inactivity Worker Type Standard_DS13_v2 56.0 GB Memory, 8 Cores, 2 DBU Min Workers 2 Max Workers 8 Driver Type Standard_DS13_v2 56.0 GB Memory, 8 Cores, 2 DBU

upvoted 2 times

sdas1

4 years, 6 months ago

{ "autoscale": { "min_workers": 2, "max_workers": 8 }, "cluster_name": "cluster2", "spark_version": "7.4.x-scala2.12", "spark_conf": { "spark.databricks.repl.allowedLanguages": "sql,python,r", "spark.databricks.cluster.profile": "serverless" }, "node_type_id": "Standard_DS13_v2", "driver_node_type_id": "Standard_DS13_v2", "ssh_public_keys": [], "custom_tags": { "ResourceClass": "Serverless" }, "spark_env_vars": { "PYSPARK_PYTHON": "/databricks/python3/bin/python3" }, "autotermination_minutes": 120, "enable_elastic_disk": true, "cluster_source": "UI", "init_scripts": [], "cluster_id": "0116-203628-tins636" }

upvoted 1 times

sdas1

4 years, 6 months ago

As per below link, High Concurrency clusters are configured to not terminate automatically. But while configuring High Concurrency, I am able to set the autotermination_minutes=120 https://docs.microsoft.com/en-us/azure/databricks/clusters/configure

upvoted 2 times

...

zarga

4 years, 6 months ago

1. YES 2. NO (use job custer to reduce cost rather than high concurency) 3. NO (we can use Delta lake starting from spark 2.4.2 based on scala 2.12.x. In this example the cluster definition is based on scala 2.11)

upvoted 4 times

...

syu31svc

4 years, 7 months ago

allowed languages are R SQL and Python -> High concurrency cluster autoscaling is enabled as seen by min and max nodes -> minimise cost definitely no CREATE TABLE syntax -> no Delta Lake table Yes Yes No

upvoted 5 times

...

lingjun

4 years, 8 months ago

1. High Concurrency "Yes" because of following config: "spark_conf": { "spark.databricks.cluster.profile": "serverless", "spark.databricks.repl.allowedLanguages": "sql,python,r" }, 2. minimise cost "No", because there is no auto scale config as below: "autoscale": { "min_workers": 2, "max_workers": 8 },

upvoted 1 times

lingjun

4 years, 8 months ago

sorry, ignore the second point.

upvoted 1 times

...

Yaswant

4 years, 12 months ago

I think for part 2 of question "NO" is the right answer. Let's say we have three scheduled jobs with a difference of 180 minutes each that had to be run throughout the day. Since we have set the auto-termination to 90 minutes the cluster after executing the first schedule job remains active for 90 minutes so we'll have to pay for it. Which in turn doesn't minimize cost.

upvoted 1 times

...

passnow

5 years ago

Data Lakes Support All Data Types A data lake holds big data from many sources in a raw, granular format. It can store structured, semi-structured, or unstructured data, which means data can be kept in a more flexible format so we can transform it when we’re ready to use . I stick with the default answer

upvoted 2 times

...

shaktiprasad88

5 years ago

I think Answer is Yes No No The given Configuration is for Interactive Cluster -(My Sample Interactive Cluster with Delta Enabled) { "autoscale": { "min_workers": 2, "max_workers": 8 }, "cluster_name": "dev_work", "spark_version": "6.6.x-scala2.11", "spark_conf": { "spark.databricks.delta.preview.enabled": "true" }, "node_type_id": "Standard_DS3_v2", "driver_node_type_id": "Standard_DS3_v2", "ssh_public_keys": [], "custom_tags": {}, "spark_env_vars": {}, "autotermination_minutes": 120, "enable_elastic_disk": true, "cluster_source": "UI", "init_scripts": [], "cluster_id": "0529-111838-patch496" }

upvoted 3 times

brcdbrcd

4 years, 8 months ago

But it says: The Databricks cluster supports the creation of a Delta Lake table. It is a spark cluster and it "supports" if it is needed. So I would say Yes.

upvoted 1 times

...

dip17

5 years ago

High Concurrency does not support Auto termination; Auto-scaling minimizes the cost. So, No, Yes, Yes

upvoted 4 times

...

alexvno

5 years ago

First - True Optimized to run concurrent SQL, Phyton and R workloads" Doesn't support Scala. Previously known as SERVERLESS

upvoted 1 times

...

AhmedReda

5 years, 1 month ago

This link shows that standard for single user, so i think High concurrency clusters for concurrency : https://docs.microsoft.com/en-us/azure/databricks/clusters/configure Standard clusters --------------------------- Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL. 1) No 2) Yes :autoscale enabled and auto-termination was decreased from 120 default to 90 3) Yes

upvoted 6 times

essdeecee

4 years, 9 months ago

Standard_DS13_v2 is a High Concurrency Cluster Mode, if I select High Concurrency the Worker Type defaults to Standard_DS13_v2

upvoted 2 times

...

Abhilvs

5 years, 1 month ago

Yes - Standard_DS13_V2 is cluster mode for High concurrency No- It's an interactive cluster Yes - I'm not sure, it seems like it is default setting when SQL API is chosen.

upvoted 2 times

...

Nehuuu

5 years, 4 months ago

In part 2 of the question, I have a confusion, in the datbricks config, the auto termination is set to 90 mins, and hence there is a provision of automatically getting the cluster down and minimizing cost. Had it been 0, it would to be auto termination disabled. Any thoughtS?

upvoted 2 times

avestabrzn

5 years, 4 months ago

I think it talks about running a job on a job cluster instead of an interactive cluster. Not sure..

upvoted 3 times

...

Yuri1101

5 years, 3 months ago

I think part 2 should be yes

upvoted 4 times

Mathster

5 years, 2 months ago

To minimize the cost, it shoud be set to the lower value = 10. Since it is set to 90, it means the cluster can run for nothing during the next 90 minutes after the last schedule job which is not cost-efficient so the answer "NO" is correct for this one. YES/NO/YES seams to be the correct answer.

upvoted 21 times

...

andreeavi

4 years, 7 months ago

High Concurrency clusters are configured to not terminate automatically. https://docs.microsoft.com/en-us/azure/databricks/clusters/configure

upvoted 1 times

andreeavi

4 years, 7 months ago

ignore it. it's not set by default

upvoted 1 times

...