Exam AWS Certified AI Practitioner AIF-C01 topic 1 question 106 discussion

Exam question from Amazon's AWS Certified AI Practitioner AIF-C01

Question #: 106
Topic #: 1

[All AWS Certified AI Practitioner AIF-C01 Questions]

A company is introducing a mobile app that helps users learn foreign languages. The app makes text more coherent by calling a large language model (LLM). The company collected a diverse dataset of text and supplemented the dataset with examples of more readable versions. The company wants the LLM output to resemble the provided examples.

Which metric should the company use to assess whether the LLM meets these requirements?

A. Value of the loss function
B. Semantic robustness
C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score
D. Latency of the text generation

Show Suggested Answer

Suggested Answer: C 🗳️

by 26b8fe1 at Dec. 26, 2024, 3:16 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Jessiii

4 months, 1 week ago

Selected Answer: C

The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score is widely used to measure the similarity between generated text and a set of reference texts. Since the company wants the LLM's output to resemble the provided readable examples, ROUGE is the most appropriate metric. ROUGE compares the LLM-generated text with the human-provided reference texts by evaluating n-gram overlap, precision, recall, and F1 score, making it a great choice for text coherence and readability assessment.

upvoted 3 times

...

may2021_r

5 months, 3 weeks ago

Selected Answer: C

The correct answer is C. ROUGE score measures how well generated text matches reference examples.

upvoted 1 times

...

aws_Tamilan

5 months, 3 weeks ago

Selected Answer: C

Since the company wants the LLM output to resemble the provided examples in terms of coherence and readability, ROUGE score is the best metric for this evaluation.

upvoted 1 times

...

26b8fe1

5 months, 4 weeks ago

Selected Answer: C

he most suitable metric to assess whether the LLM output resembles the provided examples of more readable text is: C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score The ROUGE score is commonly used for evaluating the quality of text summarization and machine-generated text by comparing it to a set of reference texts. It measures how well the generated text matches the provided examples in terms of content and coherence. Specifically, ROUGE scores focus on the overlap of n-grams, word sequences, and word pairs between the generated text and the reference texts, making it ideal for this use case.

upvoted 1 times

...