exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 261 discussion

A social media company wants to develop a machine learning (ML) model to detect inappropriate or offensive content in images. The company has collected a large dataset of labeled images and plans to use the built-in Amazon SageMaker image classification algorithm to train the model. The company also intends to use SageMaker pipe mode to speed up the training.

The company splits the dataset into training, validation, and testing datasets. The company stores the training and validation images in folders that are named Training and Validation, respectively. The folders contain subfolders that correspond to the names of the dataset classes. The company resizes the images to the same size and generates two input manifest files named training.lst and validation.lst, for the training dataset and the validation dataset, respectively. Finally, the company creates two separate Amazon S3 buckets for uploads of the training dataset and the validation dataset.

Which additional data preparation steps should the company take before uploading the files to Amazon S3?

  • A. Generate two Apache Parquet files, training.parquet and validation.parquet, by reading the images into a Pandas data frame and storing the data frame as a Parquet file. Upload the Parquet files to the training S3 bucket.
  • B. Compress the training and validation directories by using the Snappy compression library. Upload the manifest and compressed files to the training S3 bucket.
  • C. Compress the training and validation directories by using the gzip compression library. Upload the manifest and compressed files to the training S3 bucket.
  • D. Generate two RecordIO files, training.rec and validation.rec, from the manifest files by using the im2rec Apache MXNet utility tool. Upload the RecordIO files to the training S3 bucket.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
AIWave
8 months, 1 week ago
Selected Answer: D
Amazon SageMaker's built-in image classification algorithm supports input data in RecordIO format for training. RecordIO is a binary file format that efficiently stores images and labels in a compact format, making it suitable for training deep learning models with large datasets. The im2rec utility tool provided by Apache MXNet can be used to generate RecordIO files from the manifest files (training.lst and validation.lst) containing image paths and labels. Using RecordIO files allows for efficient streaming of data during training, especially when combined with SageMaker's pipe mode, which can speed up the training process by reducing disk I/O.
upvoted 4 times
...
loict
1 year, 2 months ago
Selected Answer: D
A. NO - SageMaker requires RecordIO input B. NO - SageMaker requires RecordIO input C. NO - SageMaker requires RecordIO input D. YES - SageMaker requires RecordIO input
upvoted 2 times
...
Mickey321
1 year, 3 months ago
Selected Answer: D
If they want to use the RecordIO content type for training in pipe mode, they should generate two RecordIO files, training.rec and validation.rec, from the manifest files by using the im2rec Apache MXNet utility tool1. They should upload the RecordIO files to the training S3 bucket. This corresponds to option D in the question.
upvoted 1 times
...
awsarchitect5
1 year, 3 months ago
Selected Answer: D
https://aws.amazon.com/blogs/machine-learning/classify-your-own-images-using-amazon-sagemaker/
upvoted 1 times
...
ADVIT
1 year, 4 months ago
Selected Answer: D
It's D. https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.html
upvoted 1 times
...
xinyingw
1 year, 4 months ago
should be D.he company wants to use the Amazon SageMaker image classification algorithm to train the model. SageMaker's image classification algorithm requires the data to be in the RecordIO format. Therefore, the company should use the im2rec utility tool, which is part of the Apache MXNet framework, to generate RecordIO files from the manifest files. The company should generate two RecordIO files, one for the training dataset (training.rec) and one for the validation dataset (validation.rec). These RecordIO files will contain the image data along with their corresponding labels.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago