exam questions

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 2 question 41 discussion

Actual exam question from Microsoft's DP-100
Question #: 41
Topic #: 2
[All DP-100 Questions]

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Calculate the column median value and use the median value as the replacement for any missing value in the column.
Does the solution meet the goal?

  • A. Yes
  • B. No
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Aleem
Highly Voted 3 years, 12 months ago
"You need to analyze a full dataset" just means you can't drop the rows or the columns. Replacing missing data with the median may increase the cardinality but dimensionality is only increased by adding new feature columns. Median replacement is a valid method in this case. The answer should be "Yes".
upvoted 34 times
TheYazan
3 years, 1 month ago
I guess because they mentioned "appropriate" which is vague but it might be the case
upvoted 2 times
...
...
amarques
Highly Voted 5 years, 9 months ago
In the reference you mentioned it's not explicit that you can only use MICE. There's any reason because we cannot use median? Thank you
upvoted 14 times
NaishinMatiri
4 years ago
I think the answer should ONLY be the Mean. Since there are missing values in many columns and MICE uses other columns to calculate the value. Trying to calculate values where the other columns also have missing values could alter the result.
upvoted 4 times
andre999
3 years, 10 months ago
You can also use Median for numerical values, as stated in the exercice.
upvoted 3 times
...
...
...
jefimija
Most Recent 6 months, 3 weeks ago
if values are missing in the entire column, how do you calculate the median?
upvoted 1 times
...
haby
1 year, 4 months ago
B for sure. Median is only available if the columns/rows are numeric or ordinal, if they are categorical or even string, we have to use Mode or other methods.
upvoted 1 times
Anjiiiiiii
11 months, 1 week ago
The question says it's a numerical dataset.
upvoted 2 times
...
...
Ahmed_Gehad
1 year, 9 months ago
Selected Answer: A
The solution meets the goal because it uses an appropriate operation to clean the missing values without affecting the dimensionality of the feature set. The median value is a good choice for imputing missing values because it is not affected by outliers. Additionally, the median value does not affect the dimensionality of the feature set because it is a single value.
upvoted 1 times
...
snegnik
1 year, 11 months ago
I think B is True. Median replacement "analyze" only one column, but MICE method use all columns for one. Also we can use Probablistic PCA mothod.
upvoted 2 times
...
MarinaMijailovic
1 year, 11 months ago
If columns are numeric then yes, otherwise no. The problem is we dont have that information.
upvoted 1 times
...
swatidorge1010
1 year, 11 months ago
Selected Answer: A
Please change answer or justify
upvoted 1 times
...
swatidorge1010
1 year, 11 months ago
Selected Answer: B
Answer : B As its not mentioned all columns have numeric values.
upvoted 2 times
snegnik
1 year, 11 months ago
It is mentioned "You are analyzing a numerical dataset"
upvoted 2 times
...
...
ajay0011
2 years, 1 month ago
Selected Answer: A
Yes is the answer, please change it
upvoted 1 times
...
Edriv
2 years, 4 months ago
Agree A
upvoted 1 times
...
jlopezfelizzola
2 years, 7 months ago
Selected Answer: A
Should be A
upvoted 3 times
...
FU_User
2 years, 11 months ago
Selected Answer: A
It is correct as it doesn't add or drop columns. Also "You need to analyze a full dataset" can either mean that the algorithm should take the full existing dataset into account when replacing values (which this one does) or that no existing data should be deleted (median does not delete anything)
upvoted 3 times
...
VEDPRASAD
3 years, 1 month ago
Selected Answer: A
In an existing project we did median, with clustered dataset. so median works
upvoted 2 times
...
David_Tadeu
3 years, 1 month ago
Selected Answer: A
Applying the mentioned method to the following dataset 1 | 5 | - | - | 7 | 3 | - | 0 | 2 | 2 | 7 | 4 | 2 | 6 | 9 | - | 2 | - | 3 | - | - | - | 7 | - | would lead to 1 | 4 | 4 | 4 | 7 | 3 | 2 | 0 | 2 | 2 | 7 | 4 | 2 | 6 | 9 | 4 | 2 | 4 | 3 | 5 | 5 | 5 | 7 | 5 | Hence - missing data cleaned - dimensionality preserved
upvoted 4 times
David_Tadeu
3 years, 1 month ago
oops the entry (1,2) is meant to be a 5 in the dataset below
upvoted 1 times
...
...
synapse
3 years, 1 month ago
Selected Answer: A
Copying a good answer: "You need to analyze a full dataset" just means you can't drop the rows or the columns. Replacing missing data with the median may increase the cardinality but dimensionality is only increased by adding new feature columns. Median replacement is a valid method in this case. The answer should be "Yes"
upvoted 4 times
...
mikosann
3 years, 7 months ago
I think the answer is correct. Filling missing values with mean/median is a highly used method. It works on each column separately and independently. And it only replaces the missing values, doesn't add any new columns or new rows to the dataset which means it doesn't effect dimensions. Taking the mean/median of the column and replacing missing values is one of the beginner data science topics. MICE can be a better method but this doesn't mean the answer is wrong.
upvoted 4 times
mikosann
3 years, 7 months ago
I mean the solution is correct. Not the answer.
upvoted 3 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago