Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 3 question 7 discussion

Actual exam question from Microsoft's DP-100

Question #: 7
Topic #: 3

You create a datastore named training_data that references a blob container in an Azure Storage account. The blob container contains a folder named csv_files in which multiple comma-separated values (CSV) files are stored.
You have a script named train.py in a local folder named ./script that you plan to run as an experiment using an estimator. The script includes the following code to read data from the csv_files folder:

You have the following script.

You need to configure the estimator for the experiment so that the script can read the data from a data reference named data_ref that references the csv_files folder in the training_data datastore.
Which code should you use to configure the estimator?
A.

B.

C.

D.

E.

Show Suggested Answer

Suggested Answer: B
Besides passing the dataset through the input parameters in the estimator, you can also pass the dataset through script_params and get the data path (mounting point) in your training script via arguments. This way, you can keep your training script independent of azureml-sdk. In other words, you will be able use the same training script for local debugging and remote training on any cloud platform.
Example:
from azureml.train.sklearn import SKLearn
script_params = {
# mount the dataset on the remote compute and pass the mounted path as an argument to the training script
'--data-folder': mnist_ds.as_named_input('mnist').as_mount(),
'--regularization': 0.5
}
est = SKLearn(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
environment_definition=env,
entry_script='train_mnist.py')
# Run the experiment
run = experiment.submit(est)
run.wait_for_completion(show_output=True)
Incorrect Answers:
A: Pandas DataFrame not used.
Reference:
https://docs.microsoft.com/es-es/azure/machine-learning/how-to-train-with-datasets

by iuolu at May 1, 2021, 2:36 p.m.

Comments

Submit Cancel

chaudha4

Highly Voted 3 years, 1 month ago

The use of estimator is deprecated. Use the ScriptRunConfig object with your own defined environment. Hope we don't see this question going forward !! https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py

upvoted 13 times

scipio

3 years ago

You're right, but if you replace the estimator with the ScriptRunConfig this question still holds, as the method to pass Dataset, mount vs. download, by argument, etc.. are relevant

upvoted 5 times

...

vv_bb

Most Recent 6 months, 3 weeks ago

Even though the Estimator is deprecated in favor for ScriptRunConfig (google - "Migrating from Estimators to ScriptRunConfig") , I tried to understand the correct answer for the question as it is defined here. 1) For Estimator class both "script_params" and "arguments" parameters are acceptable check here - https://learn.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py 2) So how to define which of them is valid in our case? The answer is here: (be aware for PythonScriptStep "arguments" is the same as "script_params" for Estimator) https://learn.microsoft.com/en-us/azure/machine-learning/how-to-move-data-in-out-of-pipelines?view=azureml-api-1#access-datasets-within-your-script Meaning because in our script we use the ArgParser we have to pass the dataset using the "script_params"

upvoted 3 times

...

iai

1 year ago

Shouldn't it be D.? for local compute_target not sure if as_mount will work. better as_download

upvoted 2 times

...

danishanis

1 year, 3 months ago

Answer is B. I typed the question as it is in ChatGPT and it gave the answer where the 'script_params' argument is configured to read data from 'data_ref' (and data_ref.as_mount() is being used to specify the file path in datastore) that references a 'csv_files' folder.

upvoted 2 times

...

jpalaci22

1 year, 3 months ago

Seen on the exam 20Feb2023

upvoted 3 times

...

Edriv

1 year, 5 months ago

can be A,C,E - what do you thing?

upvoted 1 times

...

ning

2 years ago

B should be correct!

upvoted 3 times

...

TheYazan

2 years, 2 months ago

on march 2022

upvoted 4 times

...

[Removed]

2 years, 3 months ago

On 20Feb2022

upvoted 4 times

...

kisskeo

2 years, 8 months ago

On Exam 01 Oct 2021

upvoted 3 times

...

ljljljlj

2 years, 11 months ago

On exam 2021/7/10

upvoted 3 times

...

sarahmoin

2 years, 12 months ago

what is the correct answer? Why its not D.

upvoted 1 times

vhx

2 years, 11 months ago

as_download, which copies the files to a temporary location on the compute where the script is being run. as_mount to stream the files directly from their source.

upvoted 3 times

iai

1 year ago

Notice however, that compute target is local, will mounting work?

upvoted 1 times

...

iuolu

3 years, 1 month ago

Nobody checked this question? The answer should be A, using to_pandas_dataframe() for tabular files instead

upvoted 2 times

chaudha4

3 years, 1 month ago

No, you are wrong. Several problems in A. 1) Parameter is being passed as named input. That is wrong since it is not being accessed using named input in t he script. 2) You convert to dataframe in the script not when you pass it. So A is definitely not the correct answer.

upvoted 10 times

...