HOTSPOT - The following code segment is used to create an Azure Databricks cluster. For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area:
Box 2: No - autotermination_minutes: Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination.
1. Yes
A cluster mode of ‘High Concurrency’ is selected, unlike all the others which are ‘Standard’. This results in a worker type of Standard_DS13_v2.
ref: https://adatis.co.uk/databricks-cluster-sizing/
2. NO
recommended: New Job Cluster.
When you run a job on a new cluster, the job is treated as a data engineering (job) workload subject to the job workload pricing. When you run a job on an existing cluster, the job is treated as a data analytics (all-purpose) workload subject to all-purpose workload pricing.
ref: https://docs.microsoft.com/en-us/azure/databricks/jobs
Scheduled batch workload- Launch new cluster via job
ref: https://docs.databricks.com/administration-guide/capacity-planning/cmbp.html#plan-capacity-and-control-cost
3.YES
Delta Lake on Databricks allows you to configure Delta Lake based on your workload patterns.
ref: https://docs.databricks.com/delta/index.html
This explanation is entirely correct.
the first item is referencing 'high concurrency' and one could check this while creating an interactive cluster.
second item, a new job cluster should be created for job purposes as the existing all purpose cluster has different pricing. refer to the url provided at the bottom
lastly, delta lake is configurable in the mentioned cluster version
Reference: https://docs.microsoft.com/en-us/azure/databricks/jobs#cluster-config-tips
My take on it:
Yes to multiple users - fits to support high concurrency since no scala support
Yes to efficiency - autostop and autoscale
Yes to the delta store - elastic disk (not 100% sure about that)
Auto termination is not configured for high concurrency clusters. so this cluster does not support high concurrency. So the answer should be
No
Yes
No
refer
https://docs.databricks.com/clusters/clusters-manage.html#automatic-termination
it seems serverless corresponds to "high concurrency" as per this blogpost - https://databricks.com/blog/2017/06/07/databricks-serverless-next-generation-resource-management-for-apache-spark.html
As per below link, High Concurrency clusters are configured to not terminate automatically. But while configuring High Concurrency, I am able to set the autotermination_minutes=120
https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
1. YES
2. NO (use job custer to reduce cost rather than high concurency)
3. NO (we can use Delta lake starting from spark 2.4.2 based on scala 2.12.x. In this example the cluster definition is based on scala 2.11)
allowed languages are R SQL and Python -> High concurrency cluster
autoscaling is enabled as seen by min and max nodes -> minimise cost definitely
no CREATE TABLE syntax -> no Delta Lake table
Yes Yes No
1. High Concurrency "Yes" because of following config:
"spark_conf": {
"spark.databricks.cluster.profile": "serverless",
"spark.databricks.repl.allowedLanguages": "sql,python,r"
},
2. minimise cost "No", because there is no auto scale config as below:
"autoscale": {
"min_workers": 2,
"max_workers": 8
},
I think for part 2 of question "NO" is the right answer. Let's say we have three scheduled jobs with a difference of 180 minutes each that had to be run throughout the day. Since we have set the auto-termination to 90 minutes the cluster after executing the first schedule job remains active for 90 minutes so we'll have to pay for it. Which in turn doesn't minimize cost.
Data Lakes Support All Data Types
A data lake holds big data from many sources in a raw, granular format. It can store structured, semi-structured, or unstructured data, which means data can be kept in a more flexible format so we can transform it when we’re ready to use . I stick with the default answer
But it says: The Databricks cluster supports the creation of a Delta Lake table.
It is a spark cluster and it "supports" if it is needed. So I would say Yes.
This link shows that standard for single user, so i think High concurrency clusters for concurrency : https://docs.microsoft.com/en-us/azure/databricks/clusters/configure
Standard clusters
---------------------------
Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
1) No
2) Yes :autoscale enabled and auto-termination was decreased from 120 default to 90
3) Yes
Yes - Standard_DS13_V2 is cluster mode for High concurrency
No- It's an interactive cluster
Yes - I'm not sure, it seems like it is default setting when SQL API is chosen.
In part 2 of the question, I have a confusion, in the datbricks config, the auto termination is set to 90 mins, and hence there is a provision of automatically getting the cluster down and minimizing cost. Had it been 0, it would to be auto termination disabled.
Any thoughtS?
To minimize the cost, it shoud be set to the lower value = 10. Since it is set to 90, it means the cluster can run for nothing during the next 90 minutes after the last schedule job which is not cost-efficient so the answer "NO" is correct for this one.
YES/NO/YES seams to be the correct answer.
This section is not available anymore. Please use the main Exam Page.DP-201 Exam Questions
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
rmk4ever
Highly Voted 4 years, 8 months agocadio30
4 years agocadio30
4 years agoLeonido
Highly Voted 5 years, 1 month agoknightkkd
4 years, 7 months agoD_Duke
4 years, 7 months agoawitick
4 years, 4 months agokarma_wins
Most Recent 4 years, 1 month agosdas1
4 years, 4 months agosdas1
4 years, 4 months agosdas1
4 years, 4 months agosdas1
4 years, 4 months agozarga
4 years, 4 months agosyu31svc
4 years, 6 months agolingjun
4 years, 6 months agolingjun
4 years, 6 months agoYaswant
4 years, 10 months agopassnow
4 years, 10 months agoshaktiprasad88
4 years, 11 months agobrcdbrcd
4 years, 6 months agodip17
4 years, 11 months agoalexvno
4 years, 11 months agoAhmedReda
4 years, 11 months agoessdeecee
4 years, 7 months agoAbhilvs
4 years, 11 months agoNehuuu
5 years, 2 months agoavestabrzn
5 years, 2 months agoYuri1101
5 years, 1 month agoMathster
5 years agoandreeavi
4 years, 5 months agoandreeavi
4 years, 5 months ago