HOTSPOT - Which Azure service and feature should you recommend using to manage the transient data for Data Lake Storage? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:
Suggested Answer:
Scenario: Stage inventory data in Azure Data Lake Storage Gen2 before loading the data into the analytical data store. Litware wants to remove transient data from Data Lake Storage once the data is no longer in use. Files that have a modified date that is older than 14 days must be removed.
Service: Azure Data Factory - Clean up files by built-in delete activity in Azure Data Factory (ADF). ADF built-in delete activity, which can be part of your ETL workflow to deletes undesired files without writing code. You can use ADF to delete folder or files from Azure Blob Storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, File System, FTP Server, sFTP Server, and Amazon S3. You can delete expired files only rather than deleting all the files in one folder. For example, you may want to only delete the files which were last modified more than 13 days ago.
Feature: Delete Activity - Reference: https://azure.microsoft.com/sv-se/blog/clean-up-files-by-built-in-delete-activity-in-azure-data-factory/ Design data processing solutions
The question asked to remove files older than 14 days which i think ADF & Delete could not do it, so the answer might be = (1) Azure Storage (2) Lifecycle management rule
From older comments, ADF + Delete and Azure Storage + Lifecycle management rule seem to have similar functionality to remove files. However there is a difference: Liftcycle is defined based on the creation of the file, and in this question and context, it says:" Files that have a modified date that is older than 14 days must be removed". i.e. the file removal is based on the modified date. As BungyTex confirmed below, ADF + Delete can achieve this objective and the answer is correct.
The way i see this, if the inventory data is coming from a microsoft SQL server, it is being ingested by ADF and not in Azure Storage, and if using ADF then the delete activity should be used. As per other comments this is proven to work
Azure Data Lake Storage lifecycle management is now generally available
https://azure.microsoft.com/en-us/updates/lifecycle-management-for-azure-data-lake-storage-is-now-generally-available/
The correct answer should be Az Store and Lifecycle ... because ADLSG2 lets delete any file, the unique exception is "If you use the Delete Blob API to delete a directory, that directory will be deleted only if it's empty. This means that you can't use the Blob API delete directories recursively." and support all operation in lifecycle management except "Lifecycle management policies with premium tier for Azure Data Lake Storage.
You can't move data that's stored in the premium tier between hot, cool, and archive tiers. However, you can copy data from the premium tier to the hot access tier in a different account."
Ref https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues
lifecycle management is available in ADLS from July 31, 2020
https://azure.microsoft.com/en-us/updates/lifecycle-management-for-azure-data-lake-storage-is-now-generally-available/
Both ADF with delete or storage with lifecyle will work. I literally build the last one this week. I think that is the best solution as this is the cheapest and easiest. Doesn't cost anything to run, to build or to maintain.
Lifecycle management policies (delete blob): Generally available in Premium, Generally available in Standard https://docs.microsoft.com/pl-pl/azure/storage/blobs/data-lake-storage-supported-blob-storage-features
https://azure.microsoft.com/en-us/updates/lifecycle-management-for-azure-data-lake-storage-is-now-generally-available/
Azure storage and lifecycle management rule are the answers
Now, Lifecycle management is supported for accounts that have a hierarchical namespace for General-purpose V2.
With this, you can reduce the delete activity (less cost even it is negligible for a pipeline).
However, I would prefer to use delete activity in ADF to make sure that they got deleted after I load them to database. Better than auto delete through lifecycle.
For me, given answer is correct based on requirement.
This section is not available anymore. Please use the main Exam Page.DP-201 Exam Questions
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
AhmedReda
Highly Voted 4 years, 11 months agoSai02
4 years, 7 months agobansal_vikrant
Highly Voted 5 years, 1 month agovrmei
3 years, 11 months agoPsycho
4 years agohoangton
Most Recent 3 years, 11 months agokn_shn
3 years, 11 months agosavin
4 years agoDymize
4 years agodavita8
4 years, 1 month agofelmasri
4 years, 2 months agoNeedium
4 years, 3 months agolky17
4 years, 3 months agoNasRim
4 years, 3 months agoThijsN
4 years, 4 months agomohowzeh
4 years, 4 months agomemo43
4 years agoKasiaK
4 years, 5 months agosyu31svc
4 years, 6 months agoBungyTex
4 years, 6 months agoNikP
4 years, 10 months ago