exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 3 question 28 discussion

Actual exam question from Microsoft's DP-201
Question #: 28
Topic #: 3
[All DP-201 Questions]

HOTSPOT -
You use Azure Data Lake Storage Gen2 to store data that data scientists and data engineers will query by using Azure Databricks interactive notebooks. The folders in Data Lake Storage will be secured, and users will have access only to the folders that relate to the projects on which they work.
You need to recommend which authentication methods to use for Databricks and Data Lake Storage to provide the users with the appropriate access. The solution must minimize administrative effort and development effort.
Which authentication method should you recommend for each Azure service? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Databricks: Personal access tokens
To authenticate and access Databricks REST APIs, you use personal access tokens. Tokens are similar to passwords; you should treat them with care. Tokens expire and can be revoked.
Data Lake Storage: Azure Active Directory
Azure Data Lake Storage Gen1 uses Azure Active Directory for authentication.
References:
https://docs.azuredatabricks.net/dev-tools/api/latest/authentication.html https://docs.microsoft.com/en-us/azure/data-lake-store/data-lakes-store-authentication-using-azure-active-directory

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
remz
Highly Voted 5 years ago
Answer is Correct https://docs.databricks.com/dev-tools/api/latest/authentication.html https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2 https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2#adls2-aad-credentials
upvoted 30 times
...
dip17
Highly Voted 4 years, 11 months ago
To minimize the admin effort the best option would be a creating a High Concurrency cluster enable AD credential passthrough, using RBAC to assign contributor role to the AD users (data engineers and data analyst) to the databricks workspace, apply ACLs to the specific folders for the AD users. Active Directory authentication perfectly works for both.
upvoted 22 times
...
muni53
Most Recent 3 years, 8 months ago
both shud be AD. ADB auto authenticates with AD
upvoted 1 times
...
Needium
4 years, 3 months ago
This seems a lot like Azure Active Directory in both boxes for me. First of all, I am authenticating to the Databrick UI itself to create and run notebooks and not the REST API. I would rather use the standard AAD to access Databricks and use the same AAD credentials to access ADLS Gen 2 to for the files. Of course, I will be implementing ACL to restrict access tothe folders each user should be able to access only on AAD. Ref: https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough
upvoted 4 times
...
Needium
4 years, 3 months ago
This seems a lot like Azure Active Directory in both boxes for me. First of all, I am authenticating to the Databrick UI itself to create and run notebooks and not the REST API. I would rather use the standard AAD to access Databricks and use the same AAD credentials to access ADLS Gen 2 to for the files. Of course, I will be implementing ACL to restrict access tothe folders each user should be able to access only on AAD. Ref: https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough
upvoted 1 times
...
lastname
4 years, 5 months ago
All the answers above are wrong. The correct answers are: 1.Databricks: Azure Active Directory 2.Data Lake Storage: Azure Active Directory 1. There is no mention of connecting with the Databricks API, instead the descriptions days that users will connect to ADLS using Interactive Notebooks. For that they will have to log in to Databricks itself, which will be done with their AD accounts. 2. Shared access signature or shared access keys do not use ACLs, but RBAC, and are applied on container or storage account level, NOT on directory or file level. I quote from https://docs.microsoft.com/nl-nl/azure/storage/blobs/data-lake-storage-access-control "ACLs apply only to security principals in the same tenant, and they don't apply to users who use Shared Key or shared access signature (SAS) token authentication. That's because no identity is associated with the caller and therefore security principal permission-based authorization cannot be performed."
upvoted 13 times
zarga
4 years, 4 months ago
1.Databricks: Azure Active Directory (minimize administrative effort) 2.Data Lake Storage: Azure Active Directory
upvoted 5 times
...
...
syu31svc
4 years, 6 months ago
I would say the answer is correct https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/authentication: "To authenticate to and access Databricks REST APIs, you can use Azure Databricks personal access tokens or Azure Active Directory (Azure AD) tokens." https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control: "Always use Azure AD security groups as the assigned principal in an ACL entry"
upvoted 1 times
lastname
4 years, 5 months ago
I see no mention of an API, it's logging in to Databricks and then querying ADLS.
upvoted 3 times
...
...
M0e
4 years, 7 months ago
Where in the question does it talk about accessing the REST API? Personal access tokens are used to access the Databricks REST API. For interactive notebooks, AAD is the way to authenticate the users!
upvoted 7 times
lastname
4 years, 5 months ago
Indeed.
upvoted 1 times
...
...
Ash666
4 years, 10 months ago
Databricks - Azure Key Vault https://docs.microsoft.com/en-us/azure/databricks/security/secrets/example-secret-workflow ADLS Gen 2 - AAD
upvoted 4 times
...
azurearch
5 years ago
AAD cant do folder level permissions in ADLS, it needs ACL to do that.
upvoted 6 times
pawhit
5 years ago
Re: ADLS, the question concerns authentication only not authorisation, so AAD for authentication and then RBAC roles for authorisation.
upvoted 6 times
...
...
Leonido
5 years, 1 month ago
It looks like a bad wording in the question. The requirements are not to secure the Notebook, but only the storage access, so What I do in those cases - define access using KeyVault (so user of that notebook won't see the credentials) and secure ADLS2 with Service Identity in AAD - that allows granular authorization and project scope.
upvoted 3 times
...
Luke97
5 years, 1 month ago
I think for ADLS Gen2, it should use SAS rather than AAD (RBAC). Shared Key is not quite suitable as it make user effectively gains 'super-user' access, meaning full access to all operations on all resources, including setting owner and changing ACLs.
upvoted 6 times
HCL1991
5 years, 1 month ago
I concur, I also think that AAD should be the authentication method for databricks since personal access token is used to access databricks REST API instead of interactive notebook.
upvoted 4 times
...
Leonido
5 years, 1 month ago
Won't work - ADLS can only have account level SAS, and you need at least container wise
upvoted 4 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...