Exam AWS Certified Machine Learning - Specialty topic 1 question 261 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 261
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A social media company wants to develop a machine learning (ML) model to detect inappropriate or offensive content in images. The company has collected a large dataset of labeled images and plans to use the built-in Amazon SageMaker image classification algorithm to train the model. The company also intends to use SageMaker pipe mode to speed up the training.

The company splits the dataset into training, validation, and testing datasets. The company stores the training and validation images in folders that are named Training and Validation, respectively. The folders contain subfolders that correspond to the names of the dataset classes. The company resizes the images to the same size and generates two input manifest files named training.lst and validation.lst, for the training dataset and the validation dataset, respectively. Finally, the company creates two separate Amazon S3 buckets for uploads of the training dataset and the validation dataset.

Which additional data preparation steps should the company take before uploading the files to Amazon S3?

A. Generate two Apache Parquet files, training.parquet and validation.parquet, by reading the images into a Pandas data frame and storing the data frame as a Parquet file. Upload the Parquet files to the training S3 bucket.
B. Compress the training and validation directories by using the Snappy compression library. Upload the manifest and compressed files to the training S3 bucket.
C. Compress the training and validation directories by using the gzip compression library. Upload the manifest and compressed files to the training S3 bucket.
D. Generate two RecordIO files, training.rec and validation.rec, from the manifest files by using the im2rec Apache MXNet utility tool. Upload the RecordIO files to the training S3 bucket.

Show Suggested Answer

Suggested Answer: D 🗳️

by xinyingw at June 23, 2023, 4:14 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

AIWave

10 months, 1 week ago

Selected Answer: D

Amazon SageMaker's built-in image classification algorithm supports input data in RecordIO format for training. RecordIO is a binary file format that efficiently stores images and labels in a compact format, making it suitable for training deep learning models with large datasets. The im2rec utility tool provided by Apache MXNet can be used to generate RecordIO files from the manifest files (training.lst and validation.lst) containing image paths and labels. Using RecordIO files allows for efficient streaming of data during training, especially when combined with SageMaker's pipe mode, which can speed up the training process by reducing disk I/O.

upvoted 4 times

...

loict

1 year, 4 months ago

Selected Answer: D

A. NO - SageMaker requires RecordIO input B. NO - SageMaker requires RecordIO input C. NO - SageMaker requires RecordIO input D. YES - SageMaker requires RecordIO input

upvoted 2 times

...

Mickey321

1 year, 5 months ago

Selected Answer: D

If they want to use the RecordIO content type for training in pipe mode, they should generate two RecordIO files, training.rec and validation.rec, from the manifest files by using the im2rec Apache MXNet utility tool1. They should upload the RecordIO files to the training S3 bucket. This corresponds to option D in the question.

upvoted 1 times

...

awsarchitect5

1 year, 5 months ago

Selected Answer: D

https://aws.amazon.com/blogs/machine-learning/classify-your-own-images-using-amazon-sagemaker/

upvoted 1 times

...

ADVIT

1 year, 6 months ago

Selected Answer: D

It's D. https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.html

upvoted 1 times

...

xinyingw

1 year, 6 months ago

should be D.he company wants to use the Amazon SageMaker image classification algorithm to train the model. SageMaker's image classification algorithm requires the data to be in the RecordIO format. Therefore, the company should use the im2rec utility tool, which is part of the Apache MXNet framework, to generate RecordIO files from the manifest files. The company should generate two RecordIO files, one for the training dataset (training.rec) and one for the validation dataset (validation.rec). These RecordIO files will contain the image data along with their corresponding labels.

upvoted 4 times

...