exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 56 discussion

A Machine Learning Specialist is preparing data for training on Amazon SageMaker. The Specialist is using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format and is transformed into a numpy.array, which appears to be negatively affecting the speed of the training.
What should the Specialist do to optimize the data for training on SageMaker?

  • A. Use the SageMaker batch transform feature to transform the training data into a DataFrame.
  • B. Use AWS Glue to compress the data into the Apache Parquet format.
  • C. Transform the dataset into the RecordIO protobuf format.
  • D. Use the SageMaker hyperparameter optimization feature to automatically optimize the data.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rsimham
Highly Voted 2 years, 9 months ago
C is okay
upvoted 19 times
...
stamarpadar
Highly Voted 2 years, 8 months ago
Anwer is C. Most Amazon SageMaker algorithms work best when you use the optimized protobuf recordIO format for the training data. https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html
upvoted 16 times
...
Mickey321
Most Recent 10 months ago
Selected Answer: C
option C
upvoted 1 times
...
AjoseO
1 year, 4 months ago
Selected Answer: C
The Specialist should transform the dataset into the RecordIO protobuf format. This format is optimized for use with SageMaker and has been shown to improve the speed and efficiency of training algorithms. Using the RecordIO protobuf format is a best practice for preparing data for use with Amazon SageMaker, and it is specifically recommended for use with the built-in algorithms.
upvoted 1 times
...
Jeremy1
1 year, 7 months ago
Selected Answer: C
I would assume the issue is the transformation. It can be nasty slow between pandas / csv / numpy. Go to protobuf.
upvoted 1 times
...
C10ud9
2 years, 7 months ago
C is the best
upvoted 5 times
...
PRC
2 years, 8 months ago
Agree with C
upvoted 6 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...