exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 64 discussion

A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor, and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset.
Which tool should be used to improve the validation accuracy?

  • A. Amazon Comprehend syntax analysis and entity detection
  • B. Amazon SageMaker BlazingText cbow mode
  • C. Natural Language Toolkit (NLTK) stemming and stop word removal
  • D. Scikit-leam term frequency-inverse document frequency (TF-IDF) vectorizer
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
tap123
Highly Voted 3 years, 1 month ago
D is correct. Amazon Comprehend syntax analysis =/= Amazon Comprehend sentiment analysis. You need to read choices very carefully.
upvoted 35 times
mawsman
3 years, 1 month ago
We're looking only to improve the validation accuracy and Comprehend syntax analysis would help that because the word set is rich and the sentiment carying words infrequent. We're not looking to replace the sentiment analysis tool with Comprehend.
upvoted 4 times
...
...
DonaldCMLIN
Highly Voted 3 years, 1 month ago
AWS COMPREHEND IS A NATURAL LANGUAGE PROCESSING (NLP) SERVICE THAT USES MACHINE LEARNING TO DISCOVER INSIGHTS FROM TEXT. AMAZON COMPREHEND PROVIDES KEYPHRASE EXTRACTION, SENTIMENT ANALYSIS, ENTITY RECOGNITION, TOPIC MODELING, AND LANGUAGE DETECTION APIS SO YOU CAN EASILY INTEGRATE NATURAL LANGUAGE PROCESSING INTO YOUR APPLICATIONS. HTTPS://AWS.AMAZON.COM/COMPREHEND/FEATURES/?NC1=H_LS JUST THROUGH AMAZON COMPREHEND IS MUCH EASY THAN OTHER THE MUCH MORE CONVENIENT ANSWER IS A.
upvoted 23 times
ComPah
3 years, 1 month ago
Agree Also Keyword is TOOL rest are frameworks
upvoted 2 times
...
...
VR10
Most Recent 8 months, 3 weeks ago
Selected Answer: A
Both Amazon Comprehend and the TF-IDF with a classifier solution are valid. If ease of use and pre-trained capabilities are high priorities, Comprehend is a solid option. If customization and dataset-specific nuances are crucial, building a custom model with TF-IDF may be needed. Since Comprehend is a tool, I am going with A.
upvoted 1 times
...
phdykd
10 months ago
D. Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizer Here's why: TF-IDF Vectorizer: This tool from Scikit-learn is effective in handling issues of rich vocabularies and low frequency words. TF-IDF down-weights words that appear frequently across documents (thus might be less informative) and gives more weight to words that appear less frequently but might be more indicative of the sentiment. This approach can enhance the model's ability to focus on more relevant features, potentially improving validation accuracy.
upvoted 4 times
...
geoan13
12 months ago
C I think c is correct. stemming involves reducing words to their root or base form, and stop word removal involves removing common words (e.g., "the," "and," "is") that may not contribute much to sentiment analysis. By using NLTK for stemming and stop word removal, you can simplify the vocabulary and potentially improve the model's ability to capture sentiment from the remaining meaningful words. A - syntax and entity recognition wont solve the scenario B - blaze text for words. D - capturing the importance of words in a document collection. frequency of a word in a document.
upvoted 4 times
...
Selected Answer: D
D is the correct guys
upvoted 1 times
...
wendaz
1 year ago
Amazon Comprehend's syntax analysis and entity detection are more about understanding the structure of sentences and identifying entities within the text rather than tackling the problem of a rich vocabulary with low average frequency of words. TF-IDF vectorization is a technique that can help reduce the impact of common, low-information words in the dataset while emphasizing the importance of more informative, less frequent words. This could potentially improve the validation accuracy by addressing the identified problem.
upvoted 1 times
...
loict
1 year, 1 month ago
Selected Answer: A
A. YES - he works on an application and not a model, Amazon Comprehend is the ready-to-use tool he wants; TF-IDF is built-in B. NO - word2vec will be challenged with low frequency terms; GloVe and FastText are better for that C. NO - the vocabulary is righ, so stemming and stop word removal will not address the core issue D. NO - right approach, but that is not "a tool"
upvoted 1 times
...
Mickey321
1 year, 2 months ago
Selected Answer: D
Option D. This approach can help in reducing the impact of words that occur frequently in the dataset and increasing the impact of words that occur less frequently. This can help in improving the accuracy of the model.
upvoted 2 times
...
ashii007
1 year, 2 months ago
The anwer is B. Blazing text can hadle OOV words as explained below. https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html
upvoted 2 times
...
jyrajan69
1 year, 3 months ago
This is an AWS exam, so why would you choose anything other than A or B, and based on the link, it looks like B most likely
upvoted 2 times
...
kaike_reis
1 year, 3 months ago
Selected Answer: D
The passage “low average frequency of words” points directly to the use of TF-IDF. Letter A deviates from what the question proposes and is discarded. Letter B proposes a radical change in my POV. Letter C does not solve the passage mentioned at the beginning. Letter D is correct.
upvoted 2 times
...
GOSD
1 year, 6 months ago
The Amazon SageMaker BlazingText algorithm provides highly optimized implementations of the Word2vec and text classification algorithms. The Word2vec algorithm is useful for many downstream natural language processing (NLP) tasks, such as *****sentiment analysis, named entity recognition, machine translation, etc. Text classification is an important task for applications that perform web searches, information retrieval, ranking, and document classification.
upvoted 1 times
...
vassof95
1 year, 6 months ago
Selected Answer: D
I would say since the buzzword "low average frequency" comes up, the safe choise would be the tfid vectorizer. I go for D.
upvoted 2 times
...
ParkXD
1 year, 7 months ago
Selected Answer: D
The Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizer is a widely used tool to mitigate the high dimensionality of text data. Option A, Amazon Comprehend syntax analysis, and entity detection, can help in extracting useful features from the text, but it does not address the issue of high dimensionality. Option B, Amazon SageMaker BlazingText cbow mode, is a tool for training word embeddings, which can help to represent words in a lower dimensional space. However, it does not directly address the issue of high dimensionality and low frequency of words. Option C, Natural Language Toolkit (NLTK) stemming and stop word removal, can reduce the dimensionality of the feature space, but it does not address the issue of low-frequency words that are important for sentiment analysis.
upvoted 5 times
...
cpal012
1 year, 8 months ago
Selected Answer: C
Emphasis is on the rich words - so stemming can help reduce these to more common words. Blazing Text in cbow mode doesnt seem relevant is about providing words given a context. And TF-IDF I'm not sure would do anything except highlight the problem you are already having?
upvoted 1 times
...
bakarys
1 year, 8 months ago
Selected Answer: D
D. Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizer would be the best tool to use in this scenario. The TF-IDF vectorizer will give less weight to the less frequent words in the dataset, and allow the more informative and frequent words to have a greater impact on the sentiment analysis. This can help to improve the validation accuracy of the model.
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago