exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 1 question 42 discussion

Actual exam question from Microsoft's DP-201
Question #: 42
Topic #: 1
[All DP-201 Questions]

You are designing a log storage solution that will use Azure Blob storage containers.
CSV log files will be generated by a multi-tenant application. The log files will be generated for each customer at five-minute intervals. There will be more than
5,000 customers. Typically, the customers will query data generated on the day the data was created.
You need to recommend a naming convention for the virtual directories and files. The solution must minimize the time it takes for the customers to query the log files.
What naming convention should you recommend?

  • A. {year}/{month}/{day}/{hour}/{minute}/{CustomerID}.csv
  • B. {year}/{month}/{day}/{CustomerID}/{hour}/{minute}.csv
  • C. {minute}/{hour}/{day}/{month}/{year}/{CustomeriD}.csv
  • D. {CustomerID}/{year}/{month}/{day}/{hour}/{minute}.csv
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️
Reference:
https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Geo_Barros
Highly Voted 4 years, 2 months ago
In my opinion, option "D" would be the right one.
upvoted 42 times
cadio30
4 years ago
Referencing the link that was provided in the solution, it was stated in the blob path that it started using the "profile name" then proceed with the datetime stamp. It make sense that 'D' is the appropriate answer in this question.
upvoted 4 times
...
...
rahul_t
Highly Voted 4 years, 1 month ago
I think B is correct. We want to minimize the time it takes for customers to query log files. 'Typically, the customers will query data generated on the day the data was created'. So it makes sense to include the path for a particular day i.e {Year}/{Month}/{Day} close to the start. Once we have reached a particular day then we will want to filter for a particular Customer so {Year}/{Month}/{Day}/{CustomerID}. Then we will want to aggregate down to hour and minute. The only other viable option will be D. The reason I think {CustomerID} should NOT be at the beginning of the path is in the case a Customer wants to query data related to multiple CustomerIDs on the same day.
upvoted 11 times
...
Marcus1612
Most Recent 3 years, 8 months ago
I think the key word is "Multi-tenant". It appears to me that the logs for a single customer need to be under its own branch. D is the right answer
upvoted 3 times
...
J4C7
3 years, 9 months ago
what is correct answer i'm confused between B and D?
upvoted 1 times
...
msn1712
3 years, 11 months ago
Why now A be the correct answer? On the link - https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs, it's mentioned: The name of the blob follows the following naming convention: resourceId=/SUBSCRIPTIONS/{Subscription Id}/RESOURCEGROUPS/{Resource Group Name}/PROVIDERS/MICROSOFT.CDN/PROFILES/{Profile Name}/ENDPOINTS/{Endpoint Name}/ y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json
upvoted 1 times
...
Alekx42
3 years, 12 months ago
Since it is stated that this is a multi-tenant application, customers would not (and probably should not be able to) query data of other customers. This makes D the right answer. Moreover, while it said that typically the queries are done on the same day the data is created, this does not exclude the possibility of making queries that range across multiple days or months. With solution B this becomes unpleasant, since you cannot just query year/month since that will return data of all customers for that month. With solution D all queries are easier, since customerID/year/month returns immediately all the data for that customer of that month. Basically, while it is true that both B and D allow for rapid quering of data for a single customer for a single day, B is worse for all queries that want data of more than 1 day.
upvoted 4 times
tes
3 years, 11 months ago
"this does not exclude the possibility of making queries " that is additional assumption made the person who is supposed to answer it.
upvoted 1 times
...
...
BigMF
3 years, 12 months ago
All of these options are poor in my opinion and therefore hard to choose a “best” option. If it were me, I’d go with this: {CustomerID}/{year}/{month}/{day}/{CustomerID}_{year}{month}{day}{hour}{minute}.csv. This allows a customer to go directly to their folder and drill down quickly to the day they need. It also has the added benefit of the files being named intelligently and not just a “single bit of info”.csv. It also allows for easier maintenance down the road when customers leave by allowing you to easily archive or delete their data simply by archiving or deleting their folder. All that being said, I would go with D because I don’t think it is any slower for a customer to search for their data following that path than any of the others and in fact probably quicker. Also, it would provide easier maintenance down the road.
upvoted 1 times
...
Mandar77
4 years ago
I think, Answer B is correct. This is how you would like to restrict the access. question says, customer will access log information on the same day. So if you organize containers on year - month -day -customer - hour - time way, every customer has to come to day folder of that year and month and go to his container to get logs for the day. If you organize container based on customer - year - month -day - hour - time, every customer has to traverse the long search path to get to day to get the logs. With option B, searching path would be optimum considering requirement
upvoted 3 times
BigMF
3 years, 12 months ago
This logic is flawed because the customer still has to traverse a long search path when they drill down into the folder structure. You either traverse it to begin with or later in the drill down.
upvoted 1 times
...
...
tanza
4 years ago
I think answer is A
upvoted 4 times
...
Apox
4 years, 1 month ago
I am certain that B is wrong. Why should Customer ID be put randomly in between the data formats? I think D is the right answer and the reason is that each "/" takes you to a new directory (folder). As a hierarchy it would make the most sense to have a folder per customer, and then sort by date/time. Source: "Blob Path Format" Section here: https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs#blob-path-format
upvoted 4 times
KRV
4 years ago
By the looks of the question overall your argument holds good however if you read the question carefully it says ... 1. customers will query data generated on the day the data was created --> means it should start with a year to day granularity then 2. log files will be generated for each customer at five-minute intervals --> Now you are left with 2 options either organize by customer ID / hr/min or hr/min customer ID , given the case and nothing is explicility mentioned it is safe to assume that queries will be more customer centric and then within customer at a point in time and hence answer A happens to be logically more correct in the context of question ! {year}/{month}/{day}/{CustomerID}/{hour}/{minute}.csv
upvoted 4 times
...
...
maynard13x8
4 years, 2 months ago
Answer is correct. D is wrong because you duplicate year and month folders. It is also worse option because consumers query data of the day so, when you set the name, you already have all the data you are interested in.
upvoted 2 times
...
Kevin89
4 years, 2 months ago
The name of the blob follows the following naming convention: resourceId=/SUBSCRIPTIONS/{Subscription Id}/RESOURCEGROUPS/{Resource Group Name}/PROVIDERS/MICROSOFT.CDN/PROFILES/{Profile Name}/ENDPOINTS/{Endpoint Name}/ y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json so it should actually be answer a
upvoted 3 times
...
Nik71
4 years, 2 months ago
confuse between A and B after reviewing https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs feels like why we avoid A here.
upvoted 1 times
...
Neha14n
4 years, 2 months ago
Typically, the customers will query data generated on the day the data was created. This line clears query will be specific to date not customer. Or else D would be correct answer
upvoted 3 times
DongDuong
4 years, 2 months ago
agree, in this case B is more suitable
upvoted 1 times
...
...
AlexD332
4 years, 2 months ago
still not clear as query should be optimized for customers - they won't request not their data.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...