exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 52 discussion

A company has 1 million scanned documents stored as image files in Amazon S3. The documents contain typewritten application forms with information including the applicant first name, applicant last name, application date, application type, and application text. The company has developed a machine learning algorithm to extract the metadata values from the scanned documents. The company wants to allow internal data analysts to analyze and find applications using the applicant name, application date, or application text. The original images should also be downloadable. Cost control is secondary to query performance.
Which solution organizes the images and metadata to drive insights while meeting the requirements?

  • A. For each image, use object tags to add the metadata. Use Amazon S3 Select to retrieve the files based on the applicant name and application date.
  • B. Index the metadata and the Amazon S3 location of the image file in Amazon OpenSearch Service (Amazon Elasticsearch Service). Allow the data analysts to use OpenSearch Dashboards (Kibana) to submit queries to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.
  • C. Store the metadata and the Amazon S3 location of the image file in an Amazon Redshift table. Allow the data analysts to run ad-hoc queries on the table.
  • D. Store the metadata and the Amazon S3 location of the image files in an Apache Parquet file in Amazon S3, and define a table in the AWS Glue Data Catalog. Allow data analysts to use Amazon Athena to submit custom queries.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rb39
Highly Voted 3 years, 1 month ago
Selected Answer: B
OpenSearch to scan all text
upvoted 12 times
...
skb0071
Most Recent 1 year, 6 months ago
Answer is C Store the metadata in Redshift. Metadata is extracted using company provided ML program.
upvoted 1 times
...
Debi_mishra
2 years ago
B is the correct answer. Keywords to look for in question - "Performance" and "search using text". D can be correct only if there is no text based search requirement.
upvoted 1 times
...
pk349
2 years ago
B: I passed the test
upvoted 1 times
...
akashm99101001com
2 years, 2 months ago
Selected Answer: B
Option A is incorrect because object tags are not searchable and cannot be used to query the data. S3 Select can be used to retrieve the files based on the applicant name and application date, but object tags cannot be used to store metadata. Option B is correct because Amazon OpenSearch Service (Amazon Elasticsearch Service) can be used to index the metadata and the Amazon S3 location of the image file. Data analysts can use OpenSearch Dashboards (Kibana) to submit queries to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. Option C is incorrect because Amazon Redshift is not designed for storing large binary objects such as images. It is a data warehousing solution that is optimized for querying structured data. Option D is incorrect because Apache Parquet files are not optimized for querying unstructured data such as images. Amazon Athena can be used to submit custom queries, but it is not optimized for querying large binary objects.
upvoted 3 times
...
rags1482
2 years, 2 months ago
D is the right answer in Option B there is no direct method provided in this option to download the image file(s) associated with the search results.
upvoted 3 times
...
cloudlearnerhere
2 years, 6 months ago
Selected Answer: B
Correct answer is B as the metadata can be indexed with the S3 file location in ElasticSearch to provide a quick search and allow the users to download the file as well. https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/ Option A is wrong as using S3 Select would impact query performance. Option C is wrong as it would have a huge cost impact without improving query performance much. Option D is wrong as using Athena would impact query performance.
upvoted 4 times
...
JHJHJHJHJ
2 years, 8 months ago
Answer A: Validated using Jon bosco paid dumps
upvoted 2 times
JoellaLi
2 years, 7 months ago
But why A?
upvoted 1 times
...
...
Arka_01
2 years, 8 months ago
Selected Answer: B
Cost control is secondary to query performance - This is the key here. Though D also can do the work, but it will be slower than option B.
upvoted 2 times
...
rrshah83
2 years, 9 months ago
Selected Answer: D
Parquet format improves performance. None of the other options talk about performance improvement.
upvoted 3 times
Gavin_Y
2 years, 9 months ago
and 'Cost control is secondary to query performance.'
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...