exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 194 discussion

A data scientist is working on a model to predict a company's required inventory stock levels. All historical data is stored in .csv files in the company's data lake on Amazon S3. The dataset consists of approximately 500 GB of data The data scientist wants to use SQL to explore the data before training the model. The company wants to minimize costs.

Which option meets these requirements with the LEAST operational overhead?

  • A. Create an Amazon EMR cluster. Create external tables in the Apache Hive metastore, referencing the data that is stored in the S3 bucket. Explore the data from the Hive console.
  • B. Use AWS Glue to crawl the S3 bucket and create tables in the AWS Glue Data Catalog. Use Amazon Athena to explore the data.
  • C. Create an Amazon Redshift cluster. Use the COPY command to ingest the data from Amazon S3. Explore the data from the Amazon Redshift query editor GUI.
  • D. Create an Amazon Redshift cluster. Create external tables in an external schema, referencing the S3 bucket that contains the data. Explore the data from the Amazon Redshift query editor GUI.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dunhill
Highly Voted 1 year, 5 months ago
I think the answer is B. The others are quite expensive and complicated.
upvoted 6 times
...
loict
Most Recent 8 months, 1 week ago
Selected Answer: B
A. NO - B is easier B. YES - works natively against S3 C. NO - no need to import S3 data to Redshift when Presto/Athena allows you to query directly D. NO - Redshift overkill
upvoted 1 times
...
Mickey321
8 months, 3 weeks ago
Selected Answer: B
AWS Glue
upvoted 1 times
...
kaike_reis
9 months ago
Selected Answer: B
We want to use SQL to explore the 500GB database saved in S3. We also want to minimize costs and have the least headache with the operation. Letters A - C - D mean managed services, hence: headache. Correct alternative is letter B.
upvoted 1 times
...
ADVIT
10 months, 1 week ago
Selected Answer: B
B as "LEAST operational overhead"
upvoted 1 times
...
ZSun
1 year ago
The advantage of D is that Redshift, as a data warehouse, can handle large dataset(>1TB) and complex frequent query. In this example, 500GB dataset and infrequent query(I consider this just one-time ad-hoc query, just verify the data before training.) Athena would be a much better option.
upvoted 2 times
...
oso0348
1 year, 1 month ago
Selected Answer: B
The option that meets these requirements with the LEAST operational overhead is option B: Use AWS Glue to crawl the S3 bucket and create tables in the AWS Glue Data Catalog. Use Amazon Athena to explore the data. AWS Glue is a fully managed ETL service that can automatically discover and catalog metadata about data stored in various data stores, including Amazon S3. By using AWS Glue to crawl the S3 bucket, the data scientist can easily create tables in the AWS Glue Data Catalog, without needing to create or manage any infrastructure. Amazon Athena is an interactive query service that allows querying data stored in Amazon S3 using SQL. By using Amazon Athena, the data scientist can easily explore the data using SQL, without needing to set up any infrastructure.
upvoted 3 times
...
Jerry84
1 year, 4 months ago
Selected Answer: B
Both Glue and Athena are serverless hence cost effective.
upvoted 3 times
...
Peeking
1 year, 5 months ago
Selected Answer: B
B is highly managed unlike other options.
upvoted 4 times
...
akhjhjk
1 year, 5 months ago
Selected Answer: B
It seems to be B
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago