Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 64 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 64
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor, and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset.
Which tool should be used to improve the validation accuracy?

A. Amazon Comprehend syntax analysis and entity detection
B. Amazon SageMaker BlazingText cbow mode
C. Natural Language Toolkit (NLTK) stemming and stop word removal
D. Scikit-leam term frequency-inverse document frequency (TF-IDF) vectorizer

Show Suggested Answer

Suggested Answer: D 🗳️

by DonaldCMLIN at Nov. 17, 2019, 8:47 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

tap123

Highly Voted 3 years, 4 months ago

D is correct. Amazon Comprehend syntax analysis =/= Amazon Comprehend sentiment analysis. You need to read choices very carefully.

upvoted 35 times

mawsman

3 years, 4 months ago

We're looking only to improve the validation accuracy and Comprehend syntax analysis would help that because the word set is rich and the sentiment carying words infrequent. We're not looking to replace the sentiment analysis tool with Comprehend.

upvoted 4 times

...

DonaldCMLIN

Highly Voted 3 years, 4 months ago

AWS COMPREHEND IS A NATURAL LANGUAGE PROCESSING (NLP) SERVICE THAT USES MACHINE LEARNING TO DISCOVER INSIGHTS FROM TEXT. AMAZON COMPREHEND PROVIDES KEYPHRASE EXTRACTION, SENTIMENT ANALYSIS, ENTITY RECOGNITION, TOPIC MODELING, AND LANGUAGE DETECTION APIS SO YOU CAN EASILY INTEGRATE NATURAL LANGUAGE PROCESSING INTO YOUR APPLICATIONS. HTTPS://AWS.AMAZON.COM/COMPREHEND/FEATURES/?NC1=H_LS JUST THROUGH AMAZON COMPREHEND IS MUCH EASY THAN OTHER THE MUCH MORE CONVENIENT ANSWER IS A.

upvoted 23 times

ComPah

3 years, 4 months ago

Agree Also Keyword is TOOL rest are frameworks

upvoted 2 times

...

VR10

Most Recent 12 months ago

Selected Answer: A

Both Amazon Comprehend and the TF-IDF with a classifier solution are valid. If ease of use and pre-trained capabilities are high priorities, Comprehend is a solid option. If customization and dataset-specific nuances are crucial, building a custom model with TF-IDF may be needed. Since Comprehend is a tool, I am going with A.

upvoted 1 times

...

phdykd

1 year, 1 month ago

D. Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizer Here's why: TF-IDF Vectorizer: This tool from Scikit-learn is effective in handling issues of rich vocabularies and low frequency words. TF-IDF down-weights words that appear frequently across documents (thus might be less informative) and gives more weight to words that appear less frequently but might be more indicative of the sentiment. This approach can enhance the model's ability to focus on more relevant features, potentially improving validation accuracy.

upvoted 4 times

...

geoan13

1 year, 3 months ago

C I think c is correct. stemming involves reducing words to their root or base form, and stop word removal involves removing common words (e.g., "the," "and," "is") that may not contribute much to sentiment analysis. By using NLTK for stemming and stop word removal, you can simplify the vocabulary and potentially improve the model's ability to capture sentiment from the remaining meaningful words. A - syntax and entity recognition wont solve the scenario B - blaze text for words. D - capturing the importance of words in a document collection. frequency of a word in a document.

upvoted 4 times

...

elvin_ml_qayiran25091992razor

1 year, 3 months ago

Selected Answer: D

D is the correct guys

upvoted 1 times

...

wendaz

1 year, 4 months ago

Amazon Comprehend's syntax analysis and entity detection are more about understanding the structure of sentences and identifying entities within the text rather than tackling the problem of a rich vocabulary with low average frequency of words. TF-IDF vectorization is a technique that can help reduce the impact of common, low-information words in the dataset while emphasizing the importance of more informative, less frequent words. This could potentially improve the validation accuracy by addressing the identified problem.

upvoted 1 times

...

loict

1 year, 5 months ago

Selected Answer: A

A. YES - he works on an application and not a model, Amazon Comprehend is the ready-to-use tool he wants; TF-IDF is built-in B. NO - word2vec will be challenged with low frequency terms; GloVe and FastText are better for that C. NO - the vocabulary is righ, so stemming and stop word removal will not address the core issue D. NO - right approach, but that is not "a tool"

upvoted 1 times

...

Mickey321

1 year, 5 months ago

Selected Answer: D

Option D. This approach can help in reducing the impact of words that occur frequently in the dataset and increasing the impact of words that occur less frequently. This can help in improving the accuracy of the model.

upvoted 2 times

...

ashii007

1 year, 5 months ago

The anwer is B. Blazing text can hadle OOV words as explained below. https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html

upvoted 2 times

...

jyrajan69

1 year, 6 months ago

This is an AWS exam, so why would you choose anything other than A or B, and based on the link, it looks like B most likely

upvoted 2 times

...

kaike_reis

1 year, 6 months ago

Selected Answer: D

The passage “low average frequency of words” points directly to the use of TF-IDF. Letter A deviates from what the question proposes and is discarded. Letter B proposes a radical change in my POV. Letter C does not solve the passage mentioned at the beginning. Letter D is correct.

upvoted 2 times

...

GOSD

1 year, 9 months ago

The Amazon SageMaker BlazingText algorithm provides highly optimized implementations of the Word2vec and text classification algorithms. The Word2vec algorithm is useful for many downstream natural language processing (NLP) tasks, such as *****sentiment analysis, named entity recognition, machine translation, etc. Text classification is an important task for applications that perform web searches, information retrieval, ranking, and document classification.

upvoted 1 times

...

vassof95

1 year, 10 months ago

Selected Answer: D

I would say since the buzzword "low average frequency" comes up, the safe choise would be the tfid vectorizer. I go for D.

upvoted 2 times

...

ParkXD

1 year, 10 months ago

Selected Answer: D

The Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizer is a widely used tool to mitigate the high dimensionality of text data. Option A, Amazon Comprehend syntax analysis, and entity detection, can help in extracting useful features from the text, but it does not address the issue of high dimensionality. Option B, Amazon SageMaker BlazingText cbow mode, is a tool for training word embeddings, which can help to represent words in a lower dimensional space. However, it does not directly address the issue of high dimensionality and low frequency of words. Option C, Natural Language Toolkit (NLTK) stemming and stop word removal, can reduce the dimensionality of the feature space, but it does not address the issue of low-frequency words that are important for sentiment analysis.

upvoted 5 times

...

cpal012

1 year, 11 months ago

Selected Answer: C

Emphasis is on the rich words - so stemming can help reduce these to more common words. Blazing Text in cbow mode doesnt seem relevant is about providing words given a context. And TF-IDF I'm not sure would do anything except highlight the problem you are already having?

upvoted 1 times

...

bakarys

1 year, 12 months ago

Selected Answer: D

D. Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizer would be the best tool to use in this scenario. The TF-IDF vectorizer will give less weight to the less frequent words in the dataset, and allow the more informative and frequent words to have a greater impact on the sentiment analysis. This can help to improve the validation accuracy of the model.

upvoted 5 times

...

Load full discussion...