Exam AWS Certified Machine Learning - Specialty topic 1 question 221 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 221
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A company wants to segment a large group of customers into subgroups based on shared characteristics. The company’s data scientist is planning to use the Amazon SageMaker built-in k-means clustering algorithm for this task. The data scientist needs to determine the optimal number of subgroups (k) to use.

Which data visualization approach will MOST accurately determine the optimal value of k?

A. Calculate the principal component analysis (PCA) components. Run the k-means clustering algorithm for a range of k by using only the first two PCA components. For each value of k, create a scatter plot with a different color for each cluster. The optimal value of k is the value where the clusters start to look reasonably separated.
B. Calculate the principal component analysis (PCA) components. Create a line plot of the number of components against the explained variance. The optimal value of k is the number of PCA components after which the curve starts decreasing in a linear fashion.
C. Create a t-distributed stochastic neighbor embedding (t-SNE) plot for a range of perplexity values. The optimal value of k is the value of perplexity, where the clusters start to look reasonably separated.
D. Run the k-means clustering algorithm for a range of k. For each value of k, calculate the sum of squared errors (SSE). Plot a line chart of the SSE for each value of k. The optimal value of k is the point after which the curve starts decreasing in a linear fashion.

Show Suggested Answer

Suggested Answer: D 🗳️

by Amit11011996 at Feb. 6, 2023, 6:27 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Amit11011996

Highly Voted 1 year, 4 months ago

Selected Answer: D

The Answer is D

upvoted 5 times

...

Mickey321

Most Recent 10 months, 1 week ago

Selected Answer: D

Option D uses the elbow method, which is a popular and well-known method for determining the optimal value of k for k-means clustering1. It plots the sum of squared errors (SSE) for different values of k, and looks for the point where the SSE starts to decrease in a linear fashion. This point is called the elbow, and it indicates that adding more clusters does not improve the model significantly2.

upvoted 4 times

...

oso0348

1 year, 3 months ago

Selected Answer: D

The Sum of square shows variation within each cluster

upvoted 3 times

...

AjoseO

1 year, 4 months ago

Selected Answer: D

D. Run the k-means clustering algorithm for a range of k. For each value of k, calculate the sum of squared errors (SSE). Plot a line chart of the SSE for each value of k. The optimal value of k is the point after which the curve starts decreasing in a linear fashion. The sum of squared errors (SSE) measures the total variation within each cluster, and the optimal value of k is typically the point where the SSE begins to level off or decrease sharply. Plotting the SSE against the number of clusters (k) allows the data scientist to identify the optimal number of clusters based on where the SSE curve starts decreasing linearly.

upvoted 4 times

...

drcok87

1 year, 4 months ago

d https://towardsdatascience.com/explain-ml-in-a-simple-way-k-means-clustering-e925d019743b

upvoted 2 times

...