exam questions

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 3 question 48 discussion

Actual exam question from Microsoft's DP-100
Question #: 48
Topic #: 3
[All DP-100 Questions]

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You create an Azure Machine Learning service datastore in a workspace. The datastore contains the following files:
✑ /data/2018/Q1.csv
✑ /data/2018/Q2.csv
✑ /data/2018/Q3.csv
✑ /data/2018/Q4.csv
✑ /data/2019/Q1.csv
All files store data in the following format:
id,f1,f2,I
1,1,2,0
2,1,1,1
3,2,1,0
4,2,2,1
You run the following code:

You need to create a dataset named training_data and load the data from all files into a single data frame by using the following code:

Solution: Run the following code:

Does the solution meet the goal?

  • A. Yes
  • B. No
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️
Use two file paths.
Use Dataset.Tabular_from_delimeted, instead of Dataset.File.from_files as the data isn't cleansed.
Note:
A FileDataset references single or multiple files in your datastores or public URLs. If your data is already cleansed, and ready to use in training experiments, you can download or mount the files to your compute as a FileDataset object.
A TabularDataset represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas or Spark DataFrame so you can work with familiar data preparation and training libraries without having to leave your notebook. You can create a
TabularDataset object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
[Removed]
Highly Voted 1 year, 5 months ago
Based on the comments, I think that at some stage, the solution/image for this question(48) was swapped with the solution/image of the previous question(47), leading to confusion for new readers. If the solution has "Dataset.File.from_files(paths)", then the answer is B, No If the solution has "Dataset.Tabular.from_delimited_files(paths)", then the answer is A, Yes Reference: https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.dataset(class)?view=azure-ml-py
upvoted 16 times
...
slash_nyk
Highly Voted 2 years, 11 months ago
Yes. It works
upvoted 8 times
...
james2033
Most Recent 8 months ago
This question is out-of-date, obsoleted. Should be from azure.ai.ml import ... not from azureml.core import Dataset Reference: https://github.com/Azure/azure-sdk-for-python/tree/azure-ai-ml_1.11.1/sdk/ml/azure-ai-ml#authenticate-the-client
upvoted 1 times
...
fhlos
11 months, 3 weeks ago
Selected Answer: B
No - ChatGPT No, the solution does not meet the goal. The code provided to create the dataset and load the data into a single DataFrame is incorrect. To create a dataset named training_data and load the data from all files into a single DataFrame, you need to modify the code as follows: from azureml.core import Dataset paths = [(data_store, 'data/2018/*.csv'), (data_store, 'data/2019/*.csv')] training_data = Dataset.Tabular.from_delimited_files(paths) data_frame = training_data.to_pandas_dataframe() Explanation: The paths variable is updated to specify the paths of all files to be included in the dataset. In this case, it includes all CSV files in the /data/2018 and /data/2019 directories. The Dataset.Tabular.from_delimited_files() method is used to create the dataset training_data by providing the paths variable. The to_pandas_dataframe() method is called on the training_data dataset to load the data from all files into a single pandas DataFrame. By making these changes, the code will create the desired dataset and load the data from all files into a single DataFrame.
upvoted 1 times
...
abhishekm94
1 year ago
As per documentation, the correct answer is Yes. link:: https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py&viewFallbackFrom=azure-ml-pyandhttps%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fpython%2Fapi%2Fazureml-core%2Fazureml.data.tabulardataset%3Fview%3Dazure-ml-py
upvoted 1 times
...
Crusader2k13
1 year, 6 months ago
It is clearly No and the answer is correct! You can't create a pandas dataframe from Dataset.File.from_files(), only from a Tabular dataset! See: https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py FileDataset has no to_pandas_dataframe() method. See: https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py
upvoted 1 times
...
nick234987
2 years, 8 months ago
it should be YES. Check this link: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-delimited-files-path--validate-true--include-path-false--infer-column-types-true--set-column-types-none--separator------header-true--partition-format-none--support-multi-line-false--empty-as-string-false--encoding--utf8--
upvoted 2 times
...
skrjha20
2 years, 8 months ago
It should be Yes # create tabular dataset from all csv files in the directory tabular_dataset_3 = Dataset.Tabular.from_delimited_files(path=(datastore,'weather/**/*.csv'))
upvoted 3 times
...
Marcello83
2 years, 9 months ago
Tried in aml. It works...
upvoted 5 times
...
YipingRuan
2 years, 11 months ago
# create tabular dataset from all csv files in the directory tabular_dataset_3 = Dataset.Tabular.from_delimited_files(path=(datastore,'weather/**/*.csv')) https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py
upvoted 4 times
...
trickerk
2 years, 11 months ago
Answer should be Yes.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...