exam questions

Exam DA-100 All Questions

View all questions & answers for the DA-100 exam

Exam DA-100 topic 4 question 2 discussion

Actual exam question from Microsoft's DA-100
Question #: 2
Topic #: 4
[All DA-100 Questions]

DRAG DROP -
You have a query named Customer that imports CSV files from a data lake. The query contains 50,000 rows as shown in the exhibit. (Click the Exhibit tab.)

Each file contains deltas of any new or modified rows from each load to the data lake. Multiple files can have the same customer ID.
You need to keep only the last modified row for each customer ID.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:

Show Suggested Answer Hide Answer
Suggested Answer:

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
VR1
Highly Voted 4 years, 8 months ago
I think:- 1. Duplicate the query 2. Group the query to find the max modified date 3. Merge the query on inner join
upvoted 252 times
Maue
4 years, 1 month ago
Be careful, cos Person X can have repeated modified date. So a inner join will remain the duplicate (wrongly)... That’s why the “remove duplicated ID” is present. In other words, Inner join will return duplicate if the same customer ID have the same modified date 2 or more times
upvoted 1 times
Billybob0604
3 years, 3 months ago
No, modified date is datetime format. using the max you will always get a unique customer id
upvoted 2 times
...
Massy
4 years, 1 month ago
if you group by ID you doesn't have duplicate id anymore
upvoted 16 times
...
not2smart
2 years, 7 months ago
Maue is correct that duplicates of Customer + Modified Date would not work with the inner join. The Datetime format appears not to use the time part, so duplicates could exist theoretically and even Datetime when fully used would not guarantee uniqueness. However the name of the files in combination with different Modified Dates seems to be unique per file and also does not contain dates before the date of the name of the file, so suggests enough uniqueness. Even so, removing duplicates in the Customer ID column would not solve this. It would have to be removing duplicates in the combined Customer Id + Date Modified Date columns, which is not an option. My assumption here would be that the question does not expect this to occur. I would thus go with the answer given by VR1.
upvoted 1 times
...
...
Zakriya
4 years, 1 month ago
correct
upvoted 4 times
...
TechDiva
3 years, 3 months ago
Explaination by Lhouss:- 1) Duplicate Customer query 2) Group by CustId by Max ModifiedDate (only 2 columns to keep) 3) Merge two queries on CustId and ModifiedDate inner join (to retreive other customer informations related to latest Date)
upvoted 3 times
...
...
JMona
Highly Voted 4 years, 8 months ago
I thin duplicate query group data on customer id and max date merge two on inner join
upvoted 55 times
vinaikumar
4 years, 8 months ago
Its correct
upvoted 6 times
...
...
QWERTYman
Most Recent 2 years, 10 months ago
Answer: 1. Duplicate 2. Group max date 3. Merge inner join
upvoted 1 times
...
TechDiva
3 years, 3 months ago
I got it on my exam 03/17/2022. 90-95% questions were from here. 61 questions in 100 min. All 3 case studies that appeared in exam, where from here.
upvoted 2 times
Bitan47
3 years, 1 month ago
To every question you have replied the same
upvoted 12 times
...
...
PatrickStr
3 years, 6 months ago
on exam 12/10/2021
upvoted 1 times
...
aguilartu1
3 years, 7 months ago
on exam - Nov 11, 2021. my answers: Duplicate the customer query Group the CustomerGrouped Merge 2 query using a inner join
upvoted 4 times
...
kcwood94
3 years, 8 months ago
On exam 10/14/21
upvoted 1 times
...
pashi266
3 years, 9 months ago
I think the answer should be 1. Duplicate the query 2. Filter the query on Modified date is latest 3. Merge 2 queries on Customer ID and Modified date by inner join
upvoted 2 times
...
Harsh_87
3 years, 12 months ago
Tested and confused what the correct answer to this 1. Duplicate the Query 2. Group the Query to find the max modified date 3. Merge with inner Join Here is the best part when I expand the columns I start seeing duplicates 4. Remove the duplicates ?
upvoted 2 times
...
bberries
4 years ago
It says "You need to keep only the last modified row for each customer ID." The WHOLE row. If you 'group by' WITHOUT then performing an inner join merge, you are only left with 2 columns, but they require the whole row. Therefore you need to: 1. Duplicate the Query 2. Group the query by Customer ID, outputting max date 3. THEN merge using inner join
upvoted 1 times
...
TessieB
4 years, 1 month ago
I don't understand why people think you need any kind of join here. Because the question clearly states: "You need to keep only the last modified row for each customer ID." Performing an inner join or a left outer join will result in a table that will still have multiple rows per CustomerID. And the whole idea is that you only keep the last modified row for each CustomerID right?
upvoted 2 times
MLCL
4 years ago
An inner join achieves the goald of unique CustomerID rows with the latest modified rows, a left outer join leaves you with multiple rows per customerID.
upvoted 3 times
...
...
cesar_datamachine
4 years, 1 month ago
I have some doubts reading the others answer but my choice: 1.- Duplicate Query 2.- Group the query to find the max modified date per Customer ID 3.- Inner join
upvoted 3 times
...
Cherishworth
4 years, 2 months ago
I used a dataset with the same structure to test. The result is as same as VR1: 1. Duplicate the query 2. Group the query by CustomerID and MAX Modified Date 3. Merge the queries based on the grouped one, join kind is "Inner". This question is a bit tricky that, one option is based on the original query to left join. That's wrong. But if you based on the new grouped query to left join, will get the same outcomes as an inner join. I realized the wording was trying to confuse us after reading it carefully.
upvoted 6 times
...
MicahD
4 years, 3 months ago
Better answer? 1. Sort by customer ID ascending and modification date descending. 2. Remove duplicates based on customer ID.
upvoted 2 times
vivekmani2021
4 years, 2 months ago
This alone dosnt work in power query , You need to Add a Table.Buffer() between the two steps to preserve the sort order before the duplicates are dropped . So inner join is the correct answer here
upvoted 1 times
...
...
kalyhot
4 years, 3 months ago
Tested: (not need remove duplicate because Inner join) 1-duplicate query 2-group by max on date column 3-merge - inner join
upvoted 5 times
...
egyhuj
4 years, 4 months ago
Vr1 is correct
upvoted 1 times
...
arahan599
4 years, 4 months ago
I have tried this on power bi desktop. 1. Duplicate Query 2. GroupBy CustomerID to find max modified date 3. Remove Duplicates
upvoted 5 times
holySinner
4 years, 1 month ago
I have tried too and it worked
upvoted 1 times
TessieB
4 years, 1 month ago
But if you GroupBy CustomerID, then why do you need to remove duplicates? Because after GroupBy you don't have any duplicates anymore right?
upvoted 3 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...