Exam AWS Certified Data Analytics - Specialty topic 1 question 52 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 52
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A company has 1 million scanned documents stored as image files in Amazon S3. The documents contain typewritten application forms with information including the applicant first name, applicant last name, application date, application type, and application text. The company has developed a machine learning algorithm to extract the metadata values from the scanned documents. The company wants to allow internal data analysts to analyze and find applications using the applicant name, application date, or application text. The original images should also be downloadable. Cost control is secondary to query performance.
Which solution organizes the images and metadata to drive insights while meeting the requirements?

A. For each image, use object tags to add the metadata. Use Amazon S3 Select to retrieve the files based on the applicant name and application date.
B. Index the metadata and the Amazon S3 location of the image file in Amazon OpenSearch Service (Amazon Elasticsearch Service). Allow the data analysts to use OpenSearch Dashboards (Kibana) to submit queries to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.
C. Store the metadata and the Amazon S3 location of the image file in an Amazon Redshift table. Allow the data analysts to run ad-hoc queries on the table.
D. Store the metadata and the Amazon S3 location of the image files in an Apache Parquet file in Amazon S3, and define a table in the AWS Glue Data Catalog. Allow data analysts to use Amazon Athena to submit custom queries.

Show Suggested Answer

Suggested Answer: B 🗳️

by rb39 at April 22, 2022, 1:31 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rb39

Highly Voted 3 years, 4 months ago

Selected Answer: B

OpenSearch to scan all text

upvoted 12 times

...

skb0071

Most Recent 1 year, 8 months ago

Answer is C Store the metadata in Redshift. Metadata is extracted using company provided ML program.

upvoted 1 times

...

Debi_mishra

2 years, 2 months ago

B is the correct answer. Keywords to look for in question - "Performance" and "search using text". D can be correct only if there is no text based search requirement.

upvoted 1 times

...

pk349

2 years, 3 months ago

B: I passed the test

upvoted 1 times

...

akashm99101001com

2 years, 5 months ago

Selected Answer: B

Option A is incorrect because object tags are not searchable and cannot be used to query the data. S3 Select can be used to retrieve the files based on the applicant name and application date, but object tags cannot be used to store metadata. Option B is correct because Amazon OpenSearch Service (Amazon Elasticsearch Service) can be used to index the metadata and the Amazon S3 location of the image file. Data analysts can use OpenSearch Dashboards (Kibana) to submit queries to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. Option C is incorrect because Amazon Redshift is not designed for storing large binary objects such as images. It is a data warehousing solution that is optimized for querying structured data. Option D is incorrect because Apache Parquet files are not optimized for querying unstructured data such as images. Amazon Athena can be used to submit custom queries, but it is not optimized for querying large binary objects.

upvoted 3 times

...

rags1482

2 years, 5 months ago

D is the right answer in Option B there is no direct method provided in this option to download the image file(s) associated with the search results.

upvoted 3 times

...

cloudlearnerhere

2 years, 9 months ago

Selected Answer: B

Correct answer is B as the metadata can be indexed with the S3 file location in ElasticSearch to provide a quick search and allow the users to download the file as well. https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/ Option A is wrong as using S3 Select would impact query performance. Option C is wrong as it would have a huge cost impact without improving query performance much. Option D is wrong as using Athena would impact query performance.

upvoted 4 times

...