Exam DY0-001 topic 1 question 45 discussion

Actual exam question from CompTIA's DY0-001

Question #: 45
Topic #: 1

A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:

Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?

A. Interpolated data
B. Extrapolated data
C. In-sample data
D. Out-of-sample data

Show Suggested Answer

Suggested Answer: D 🗳️

by SuntzuLegacy at May 19, 2025, 9:40 a.m.

Comments

Submit Cancel

SuntzuLegacy

1 month, 1 week ago

Selected Answer: B

Looking at the sample provided, it includes only two professions (data scientist and data analyst) out of 17 total, and only two of the possible 12 locations. Thus, the model would be applying what it has learned to professions and locations that do not appear in its training portion of the data. Predicting for completely unobserved categories (or categories with too little data) is a classic case of needing to extrapolate rather than merely interpolate. Hence, the most likely concern is: B. Extrapolated data.

upvoted 1 times

...