exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 221 discussion

A company wants to segment a large group of customers into subgroups based on shared characteristics. The company’s data scientist is planning to use the Amazon SageMaker built-in k-means clustering algorithm for this task. The data scientist needs to determine the optimal number of subgroups (k) to use.

Which data visualization approach will MOST accurately determine the optimal value of k?

  • A. Calculate the principal component analysis (PCA) components. Run the k-means clustering algorithm for a range of k by using only the first two PCA components. For each value of k, create a scatter plot with a different color for each cluster. The optimal value of k is the value where the clusters start to look reasonably separated.
  • B. Calculate the principal component analysis (PCA) components. Create a line plot of the number of components against the explained variance. The optimal value of k is the number of PCA components after which the curve starts decreasing in a linear fashion.
  • C. Create a t-distributed stochastic neighbor embedding (t-SNE) plot for a range of perplexity values. The optimal value of k is the value of perplexity, where the clusters start to look reasonably separated.
  • D. Run the k-means clustering algorithm for a range of k. For each value of k, calculate the sum of squared errors (SSE). Plot a line chart of the SSE for each value of k. The optimal value of k is the point after which the curve starts decreasing in a linear fashion.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Amit11011996
Highly Voted 1 year, 3 months ago
Selected Answer: D
The Answer is D
upvoted 5 times
...
Mickey321
Most Recent 8 months, 3 weeks ago
Selected Answer: D
Option D uses the elbow method, which is a popular and well-known method for determining the optimal value of k for k-means clustering1. It plots the sum of squared errors (SSE) for different values of k, and looks for the point where the SSE starts to decrease in a linear fashion. This point is called the elbow, and it indicates that adding more clusters does not improve the model significantly2.
upvoted 4 times
...
oso0348
1 year, 2 months ago
Selected Answer: D
The Sum of square shows variation within each cluster
upvoted 3 times
...
AjoseO
1 year, 2 months ago
Selected Answer: D
D. Run the k-means clustering algorithm for a range of k. For each value of k, calculate the sum of squared errors (SSE). Plot a line chart of the SSE for each value of k. The optimal value of k is the point after which the curve starts decreasing in a linear fashion. The sum of squared errors (SSE) measures the total variation within each cluster, and the optimal value of k is typically the point where the SSE begins to level off or decrease sharply. Plotting the SSE against the number of clusters (k) allows the data scientist to identify the optimal number of clusters based on where the SSE curve starts decreasing linearly.
upvoted 4 times
...
drcok87
1 year, 3 months ago
d https://towardsdatascience.com/explain-ml-in-a-simple-way-k-means-clustering-e925d019743b
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago