Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 1 question 42 discussion

Actual exam question from Microsoft's DP-201

Question #: 42
Topic #: 1

You are designing a log storage solution that will use Azure Blob storage containers.
CSV log files will be generated by a multi-tenant application. The log files will be generated for each customer at five-minute intervals. There will be more than
5,000 customers. Typically, the customers will query data generated on the day the data was created.
You need to recommend a naming convention for the virtual directories and files. The solution must minimize the time it takes for the customers to query the log files.
What naming convention should you recommend?

A. {year}/{month}/{day}/{hour}/{minute}/{CustomerID}.csv
B. {year}/{month}/{day}/{CustomerID}/{hour}/{minute}.csv
C. {minute}/{hour}/{day}/{month}/{year}/{CustomeriD}.csv
D. {CustomerID}/{year}/{month}/{day}/{hour}/{minute}.csv

Show Suggested Answer

Suggested Answer: B 🗳️
Reference:
https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs

by AlexD332 at March 11, 2021, 11:42 p.m.

Comments

Submit Cancel

Geo_Barros

Highly Voted 4 years, 2 months ago

In my opinion, option "D" would be the right one.

upvoted 42 times

cadio30

4 years ago

Referencing the link that was provided in the solution, it was stated in the blob path that it started using the "profile name" then proceed with the datetime stamp. It make sense that 'D' is the appropriate answer in this question.

upvoted 4 times

...

rahul_t

Highly Voted 4 years, 1 month ago

I think B is correct. We want to minimize the time it takes for customers to query log files. 'Typically, the customers will query data generated on the day the data was created'. So it makes sense to include the path for a particular day i.e {Year}/{Month}/{Day} close to the start. Once we have reached a particular day then we will want to filter for a particular Customer so {Year}/{Month}/{Day}/{CustomerID}. Then we will want to aggregate down to hour and minute. The only other viable option will be D. The reason I think {CustomerID} should NOT be at the beginning of the path is in the case a Customer wants to query data related to multiple CustomerIDs on the same day.

upvoted 11 times

...

Marcus1612

Most Recent 3 years, 8 months ago

I think the key word is "Multi-tenant". It appears to me that the logs for a single customer need to be under its own branch. D is the right answer

upvoted 3 times

...

J4C7

3 years, 9 months ago

what is correct answer i'm confused between B and D?

upvoted 1 times

...

msn1712

3 years, 11 months ago

Why now A be the correct answer? On the link - https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs, it's mentioned: The name of the blob follows the following naming convention: resourceId=/SUBSCRIPTIONS/{Subscription Id}/RESOURCEGROUPS/{Resource Group Name}/PROVIDERS/MICROSOFT.CDN/PROFILES/{Profile Name}/ENDPOINTS/{Endpoint Name}/ y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json

upvoted 1 times

...

Alekx42

3 years, 12 months ago

Since it is stated that this is a multi-tenant application, customers would not (and probably should not be able to) query data of other customers. This makes D the right answer. Moreover, while it said that typically the queries are done on the same day the data is created, this does not exclude the possibility of making queries that range across multiple days or months. With solution B this becomes unpleasant, since you cannot just query year/month since that will return data of all customers for that month. With solution D all queries are easier, since customerID/year/month returns immediately all the data for that customer of that month. Basically, while it is true that both B and D allow for rapid quering of data for a single customer for a single day, B is worse for all queries that want data of more than 1 day.

upvoted 4 times

tes

3 years, 11 months ago

"this does not exclude the possibility of making queries " that is additional assumption made the person who is supposed to answer it.

upvoted 1 times

...

BigMF

3 years, 12 months ago

All of these options are poor in my opinion and therefore hard to choose a “best” option. If it were me, I’d go with this: {CustomerID}/{year}/{month}/{day}/{CustomerID}_{year}{month}{day}{hour}{minute}.csv. This allows a customer to go directly to their folder and drill down quickly to the day they need. It also has the added benefit of the files being named intelligently and not just a “single bit of info”.csv. It also allows for easier maintenance down the road when customers leave by allowing you to easily archive or delete their data simply by archiving or deleting their folder. All that being said, I would go with D because I don’t think it is any slower for a customer to search for their data following that path than any of the others and in fact probably quicker. Also, it would provide easier maintenance down the road.

upvoted 1 times

...

Mandar77

4 years ago

I think, Answer B is correct. This is how you would like to restrict the access. question says, customer will access log information on the same day. So if you organize containers on year - month -day -customer - hour - time way, every customer has to come to day folder of that year and month and go to his container to get logs for the day. If you organize container based on customer - year - month -day - hour - time, every customer has to traverse the long search path to get to day to get the logs. With option B, searching path would be optimum considering requirement

upvoted 3 times

BigMF

3 years, 12 months ago

This logic is flawed because the customer still has to traverse a long search path when they drill down into the folder structure. You either traverse it to begin with or later in the drill down.

upvoted 1 times

...

tanza

4 years ago

I think answer is A

upvoted 4 times

...

Apox

4 years, 1 month ago

I am certain that B is wrong. Why should Customer ID be put randomly in between the data formats? I think D is the right answer and the reason is that each "/" takes you to a new directory (folder). As a hierarchy it would make the most sense to have a folder per customer, and then sort by date/time. Source: "Blob Path Format" Section here: https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs#blob-path-format

upvoted 4 times

KRV

4 years ago

By the looks of the question overall your argument holds good however if you read the question carefully it says ... 1. customers will query data generated on the day the data was created --> means it should start with a year to day granularity then 2. log files will be generated for each customer at five-minute intervals --> Now you are left with 2 options either organize by customer ID / hr/min or hr/min customer ID , given the case and nothing is explicility mentioned it is safe to assume that queries will be more customer centric and then within customer at a point in time and hence answer A happens to be logically more correct in the context of question ! {year}/{month}/{day}/{CustomerID}/{hour}/{minute}.csv

upvoted 4 times

...

maynard13x8

4 years, 2 months ago

Answer is correct. D is wrong because you duplicate year and month folders. It is also worse option because consumers query data of the day so, when you set the name, you already have all the data you are interested in.

upvoted 2 times

...

Kevin89

4 years, 2 months ago

The name of the blob follows the following naming convention: resourceId=/SUBSCRIPTIONS/{Subscription Id}/RESOURCEGROUPS/{Resource Group Name}/PROVIDERS/MICROSOFT.CDN/PROFILES/{Profile Name}/ENDPOINTS/{Endpoint Name}/ y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json so it should actually be answer a

upvoted 3 times

...

Nik71

4 years, 2 months ago

confuse between A and B after reviewing https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs feels like why we avoid A here.

upvoted 1 times

...

Neha14n

4 years, 2 months ago

Typically, the customers will query data generated on the day the data was created. This line clears query will be specific to date not customer. Or else D would be correct answer

upvoted 3 times

DongDuong

4 years, 2 months ago

agree, in this case B is more suitable

upvoted 1 times

...

AlexD332

4 years, 2 months ago

still not clear as query should be optimized for customers - they won't request not their data.

upvoted 4 times

...