exam questions

Exam DP-100 All Questions

View all questions & answers for the DP-100 exam

Exam DP-100 topic 3 question 49 discussion

Actual exam question from Microsoft's DP-100
Question #: 49
Topic #: 3
[All DP-100 Questions]

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You create an Azure Machine Learning service datastore in a workspace. The datastore contains the following files:
✑ /data/2018/Q1.csv
✑ /data/2018/Q2.csv
✑ /data/2018/Q3.csv
✑ /data/2018/Q4.csv
✑ /data/2019/Q1.csv
All files store data in the following format:
id,f1,f2,I
1,1,2,0
2,1,1,1
3,2,1,0
4,2,2,1
You run the following code:

You need to create a dataset named training_data and load the data from all files into a single data frame by using the following code:

Solution: Run the following code:

Does the solution meet the goal?

  • A. Yes
  • B. No
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️
Use two file paths.
Use Dataset.Tabular_from_delimeted as the data isn't cleansed.
Note:
A TabularDataset represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a pandas or Spark DataFrame so you can work with familiar data preparation and training libraries without having to leave your notebook. You can create a
TabularDataset object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
PakE
Highly Voted 3 years, 2 months ago
The correct answer should be Yes. I have tested that the code works.
upvoted 18 times
...
brendal89
Highly Voted 3 years, 2 months ago
I think the answer might be 'yes'. see this similar example for parquet files: datastore_path = [(dstore, dset_name + '/*/*/data.parquet')] dataset = Dataset.Tabular.from_parquet_files(path=datastore_path, partition_format = dset_name + '/{partition_time:yyyy/MM}/data.parquet') the partition_format argument appears optional. reference: https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.ipynb
upvoted 11 times
...
james2033
Most Recent 8 months ago
This question is out-of-date, obsoleted. Should be from azure.ai.ml import ... not from azureml.core import Dataset Reference: https://github.com/Azure/azure-sdk-for-python/tree/azure-ai-ml_1.11.1/sdk/ml/azure-ai-ml#authenticate-the-client
upvoted 1 times
...
fhlos
11 months, 3 weeks ago
Selected Answer: A
YES - Chat GPT Yes, the solution meets the goal. The provided code correctly creates a dataset named training_data and loads the data from all files into a single DataFrame. from azureml.core import Dataset paths = [(data_store, 'data/2018/*.csv'), (data_store, 'data/2019/*.csv')] training_data = Dataset.Tabular.from_delimited_files(paths) data_frame = training_data.to_pandas_dataframe() Explanation: The code registers the Azure Blob container datastore named data_store in the workspace. The paths variable is defined to specify the paths of all files to be included in the dataset. It includes all CSV files in the /data/2018 and /data/2019 directories. The Dataset.Tabular.from_delimited_files() method is used to create the dataset training_data by providing the paths variable. The to_pandas_dataframe() method is called on the training_data dataset to load the data from all files into a single pandas DataFrame. By executing this code, the dataset training_data will be created, and the data from all files will be loaded into a single DataFrame for further processing.
upvoted 1 times
...
therealola
2 years ago
On exam 18-06-22
upvoted 2 times
...
azurelearner666
2 years, 2 months ago
Selected Answer: A
A. Yes. from azureml.core import Dataset # Get the default datastore default_ds = ws.get_default_datastore() #Create a tabular dataset from the path on the datastore (this may take a short while) tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv')) # Display the first 20 rows as a Pandas dataframe tab_data_set.take(20).to_pandas_dataframe()
upvoted 1 times
...
Thornehead
2 years, 2 months ago
Read the question again. The answer is not yes because it has missing values in it. The data has to be processed first then it should be put in for the training.
upvoted 1 times
...
nick234987
2 years, 8 months ago
It is yes, no doubt
upvoted 5 times
...
slash_nyk
2 years, 11 months ago
Hi All. The code works but answer should be yes. Pay attention to the output. Print out the the dataset and you will notice the difference in soruce
upvoted 1 times
...
surfing
3 years ago
The answer is Yes. from azureml.core import Dataset # Get the default datastore default_ds = ws.get_default_datastore() #Create a tabular dataset from the path on the datastore (this may take a short while) tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv')) # Display the first 20 rows as a Pandas dataframe tab_data_set.take(20).to_pandas_dataframe()
upvoted 3 times
...
rsamant
3 years ago
answer is yes. tested
upvoted 5 times
...
scipio
3 years, 1 month ago
I think the problem is the * for the directory. Something like this: paths = [(data_store, 'data/2018/*.csv'),(data_store, 'data/2019/*.csv')] it would be the correct way
upvoted 2 times
treadst0ne
2 years, 12 months ago
I had the same concern, but after testing it, it is possible to create a Tabular dataset passing "parent_folder/*/*.csv" as a path. So yes, answer should be A.
upvoted 2 times
...
...
ali25
3 years, 2 months ago
from azureml.core import Workspace, Datastore, Dataset datastore_name = 'your datastore name' # get existing workspace workspace = Workspace.from_config() # retrieve an existing datastore in the workspace by name datastore = Datastore.get(workspace, datastore_name) # create a TabularDataset from 3 file paths in datastore datastore_paths = [(datastore, 'weather/2018/11.csv'), (datastore, 'weather/2018/12.csv'), (datastore, 'weather/2019/*.csv')] weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...