Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 All Questions

View all questions & answers for the AWS Certified Machine Learning Engineer - Associate MLA-C01 exam

Exam AWS Certified Machine Learning Engineer - Associate MLA-C01 topic 1 question 107 discussion

Exam question from Amazon's AWS Certified Machine Learning Engineer - Associate MLA-C01

Question #: 107
Topic #: 1

[All AWS Certified Machine Learning Engineer - Associate MLA-C01 Questions]

A company needs to extract entities from a PDF document to build a classifier model.

Which solution will extract and store the entities in the LEAST amount of time?

A. Use Amazon Comprehend to extract the entities. Store the output in Amazon S3.
B. Use an open source AI optical character recognition (OCR) tool on Amazon SageMaker to extract the entities. Store the output in Amazon S3.
C. Use Amazon Textract to extract the entities. Use Amazon Comprehend to convert the entities to text. Store the output in Amazon S3.
D. Use Amazon Textract integrated with Amazon Augmented AI (Amazon A2I) to extract the entities. Store the output in Amazon S3.

Show Suggested Answer

Suggested Answer: A 🗳️

by chris_spencer at March 11, 2025, 3:31 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Rams2025

Highly Voted 4 months, 2 weeks ago

Selected Answer: A

https://aws.amazon.com/blogs/aws/now-process-pdfs-word-documents-and-images-with-amazon-comprehend-for-idp/ Amazon Comprehend feature for intelligent document processing (IDP). This feature allows you to classify and extract entities from PDF documents, Microsoft Word files, and images directly from Amazon Comprehend without you needing to extract the text first.

upvoted 5 times

...

67495ef

Most Recent 1 month ago

Selected Answer: C

Correct answer is C. Comprehend can't directly extract from PDF as it extracts from texts. So, you need to use Textract to extract from PDF first and then use Comprehend.

upvoted 1 times

...

postbox4me

1 month, 3 weeks ago

Selected Answer: A

Amazon Comprehend is a fully managed NLP service that can directly extract named entities (like people, places, organizations, etc.). Fastest and least development effort if text is already in digital format.

upvoted 2 times

...

liliu1

2 months, 3 weeks ago

Selected Answer: A

Amazon Comprehend can extract directly from PDF.

upvoted 4 times

...

snna4

3 months, 3 weeks ago

Selected Answer: C

C. * Amazon Textract is specifically designed to quickly extract text, forms, and tables from PDF documents. * Amazon Comprehend can then process the extracted text to identify entities (like names, locations, dates, etc.)

upvoted 1 times

...

AgboolaKun

4 months ago

Selected Answer: A

The best solution is to use Amazon Comprehend to extract entities and store the output in Amazon S3 because it provides direct entity extraction from text documents using pre-trained models without additional processing steps. Using Amazon Textract followed by Amazon Comprehend (Option C) would create unnecessary processing overhead since it requires two sequential services to run, increasing both the processing time and cost of the solution.

upvoted 4 times

...

chris_spencer

5 months, 1 week ago

Selected Answer: C

Agree with C. Normally Amazon Comprehend is sufficient if the pdf contains only text. Since the question does not mention the exact contents of the pdf files. It would be safer to use Amazon Textract to extract the text, then Amazon Comprehend do the entity extraction.

upvoted 1 times

...