C.
* Amazon Textract is specifically designed to quickly extract text, forms, and tables from PDF documents.
* Amazon Comprehend can then process the extracted text to identify entities (like names, locations, dates, etc.)
The best solution is to use Amazon Comprehend to extract entities and store the output in Amazon S3 because it provides direct entity extraction from text documents using pre-trained models without additional processing steps.
Using Amazon Textract followed by Amazon Comprehend (Option C) would create unnecessary processing overhead since it requires two sequential services to run, increasing both the processing time and cost of the solution.
https://aws.amazon.com/blogs/aws/now-process-pdfs-word-documents-and-images-with-amazon-comprehend-for-idp/
Amazon Comprehend feature for intelligent document processing (IDP). This feature allows you to classify and extract entities from PDF documents, Microsoft Word files, and images directly from Amazon Comprehend without you needing to extract the text first.
Agree with C.
Normally Amazon Comprehend is sufficient if the pdf contains only text. Since the question does not mention the exact contents of the pdf files. It would be safer to use Amazon Textract to extract the text, then Amazon Comprehend do the entity extraction.
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
snna4
3 days, 16 hours agoAgboolaKun
1 week, 4 days agoRams2025
4 weeks agochris_spencer
1 month, 2 weeks ago