Exam DA-100 All Questions

View all questions & answers for the DA-100 exam

Exam DA-100 topic 4 question 2 discussion

Actual exam question from Microsoft's DA-100

Question #: 2
Topic #: 4

DRAG DROP -
You have a query named Customer that imports CSV files from a data lake. The query contains 50,000 rows as shown in the exhibit. (Click the Exhibit tab.)

Each file contains deltas of any new or modified rows from each load to the data lake. Multiple files can have the same customer ID.
You need to keep only the last modified row for each customer ID.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:

Show Suggested Answer

Suggested Answer:

by Evert23 at Oct. 2, 2020, 4 p.m.

Comments

Submit Cancel

VR1

Highly Voted 4 years, 8 months ago

I think:- 1. Duplicate the query 2. Group the query to find the max modified date 3. Merge the query on inner join

upvoted 252 times

Maue

4 years, 1 month ago

Be careful, cos Person X can have repeated modified date. So a inner join will remain the duplicate (wrongly)... That’s why the “remove duplicated ID” is present. In other words, Inner join will return duplicate if the same customer ID have the same modified date 2 or more times

upvoted 1 times

Billybob0604

3 years, 3 months ago

No, modified date is datetime format. using the max you will always get a unique customer id

upvoted 2 times

...

Massy

4 years, 1 month ago

if you group by ID you doesn't have duplicate id anymore

upvoted 16 times

...

not2smart

2 years, 7 months ago

Maue is correct that duplicates of Customer + Modified Date would not work with the inner join. The Datetime format appears not to use the time part, so duplicates could exist theoretically and even Datetime when fully used would not guarantee uniqueness. However the name of the files in combination with different Modified Dates seems to be unique per file and also does not contain dates before the date of the name of the file, so suggests enough uniqueness. Even so, removing duplicates in the Customer ID column would not solve this. It would have to be removing duplicates in the combined Customer Id + Date Modified Date columns, which is not an option. My assumption here would be that the question does not expect this to occur. I would thus go with the answer given by VR1.

upvoted 1 times

...

Zakriya

4 years, 1 month ago

correct

upvoted 4 times

...

TechDiva

3 years, 3 months ago

Explaination by Lhouss:- 1) Duplicate Customer query 2) Group by CustId by Max ModifiedDate (only 2 columns to keep) 3) Merge two queries on CustId and ModifiedDate inner join (to retreive other customer informations related to latest Date)

upvoted 3 times

...

JMona

Highly Voted 4 years, 8 months ago

I thin duplicate query group data on customer id and max date merge two on inner join

upvoted 55 times

vinaikumar

4 years, 8 months ago

Its correct

upvoted 6 times

...

QWERTYman

Most Recent 2 years, 10 months ago

Answer: 1. Duplicate 2. Group max date 3. Merge inner join

upvoted 1 times

...

TechDiva

3 years, 3 months ago

I got it on my exam 03/17/2022. 90-95% questions were from here. 61 questions in 100 min. All 3 case studies that appeared in exam, where from here.

upvoted 2 times

Bitan47

3 years, 1 month ago

To every question you have replied the same

upvoted 12 times

...

PatrickStr

3 years, 6 months ago

on exam 12/10/2021

upvoted 1 times

...

aguilartu1

3 years, 7 months ago

on exam - Nov 11, 2021. my answers: Duplicate the customer query Group the CustomerGrouped Merge 2 query using a inner join

upvoted 4 times

...

kcwood94

3 years, 8 months ago

On exam 10/14/21

upvoted 1 times

...

pashi266

3 years, 9 months ago

I think the answer should be 1. Duplicate the query 2. Filter the query on Modified date is latest 3. Merge 2 queries on Customer ID and Modified date by inner join

upvoted 2 times

...

Harsh_87

3 years, 12 months ago

Tested and confused what the correct answer to this 1. Duplicate the Query 2. Group the Query to find the max modified date 3. Merge with inner Join Here is the best part when I expand the columns I start seeing duplicates 4. Remove the duplicates ?

upvoted 2 times

...

bberries

4 years ago

It says "You need to keep only the last modified row for each customer ID." The WHOLE row. If you 'group by' WITHOUT then performing an inner join merge, you are only left with 2 columns, but they require the whole row. Therefore you need to: 1. Duplicate the Query 2. Group the query by Customer ID, outputting max date 3. THEN merge using inner join

upvoted 1 times

...

TessieB

4 years, 1 month ago

I don't understand why people think you need any kind of join here. Because the question clearly states: "You need to keep only the last modified row for each customer ID." Performing an inner join or a left outer join will result in a table that will still have multiple rows per CustomerID. And the whole idea is that you only keep the last modified row for each CustomerID right?

upvoted 2 times

MLCL

4 years ago

An inner join achieves the goald of unique CustomerID rows with the latest modified rows, a left outer join leaves you with multiple rows per customerID.

upvoted 3 times

...

cesar_datamachine

4 years, 1 month ago

I have some doubts reading the others answer but my choice: 1.- Duplicate Query 2.- Group the query to find the max modified date per Customer ID 3.- Inner join

upvoted 3 times

...

Cherishworth

4 years, 2 months ago

I used a dataset with the same structure to test. The result is as same as VR1: 1. Duplicate the query 2. Group the query by CustomerID and MAX Modified Date 3. Merge the queries based on the grouped one, join kind is "Inner". This question is a bit tricky that, one option is based on the original query to left join. That's wrong. But if you based on the new grouped query to left join, will get the same outcomes as an inner join. I realized the wording was trying to confuse us after reading it carefully.

upvoted 6 times

...

MicahD

4 years, 3 months ago

Better answer? 1. Sort by customer ID ascending and modification date descending. 2. Remove duplicates based on customer ID.

upvoted 2 times

vivekmani2021

4 years, 2 months ago

This alone dosnt work in power query , You need to Add a Table.Buffer() between the two steps to preserve the sort order before the duplicates are dropped . So inner join is the correct answer here

upvoted 1 times

...