Exam DP-100 topic 2 question 45 discussion

Actual exam question from Microsoft's DP-100

Question #: 45
Topic #: 2

HOTSPOT -
You are evaluating a Python NumPy array that contains six data points defined as follows: data = [10, 20, 30, 40, 50, 60]
You must generate the following output by using the k-fold algorithm implantation in the Python Scikit-learn machine learning library: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test: [10 50] train: [10 20 30 50], test: [40 60]
You need to implement a cross-validation to generate the output.
How should you complete the code segment? To answer, select the appropriate code segment in the dialog box in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer

Suggested Answer:

Box 1: k-fold -

Box 2: 3 -
K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default).
The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.

Box 3: data -
Example: Example:
>>>
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]
Reference:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

by podval at July 12, 2020, 6:45 p.m.

Comments

Submit Cancel

podval

Highly Voted 4 years, 4 months ago

Proper syntax: from sklearn.model_selection import KFold

upvoted 23 times

David_Tadeu

2 years, 7 months ago

If the actual question is written with 'k-fold' instead of 'Kfold', that's just stupid.

upvoted 4 times

...

ljljljlj

Highly Voted 3 years, 4 months ago

On exam 2021/7/10

upvoted 5 times

...

Matt2000

Most Recent 10 months ago

from sklearn.model_selection import KFold from numpy import array import numpy as np data = array([10,20,30,40,50,60]) k_fold = KFold(n_splits=3, shuffle=True,random_state=1) for train, test in k_fold, np.split(data): print(f'train: {train}, test: {test}')

upvoted 2 times

...

Matt2000

10 months, 2 weeks ago

"-" shoud be read as "="

upvoted 1 times

...

Hisayuki

1 year, 1 month ago

You're gonna create three set of Train and Test dataset with Shuffling. So, the n_splits should be 3 in kfold. - train: [10 40 50 60], test: [20 30] - train: [20 30 40 60], test: [10 50] - train: [10 20 30 50], test: [40 60]

upvoted 3 times

...

ning

2 years, 6 months ago

Might be a typo, but overall is correct

upvoted 2 times

...