Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
sale

Want to Unlock All Questions for this Exam?

Full Exam Access, Discussions, No Robots Checks

Amazon AWS Certified Big Data - Specialty Exam Practice Questions

The questions for AWS Certified Big Data - Specialty were last updated on March 18, 2024.
  • Viewing page 1 out of 22 pages.
  • Viewing questions 1-4 out of 89 questions
Disclaimers:
  • - ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
  • - Trademarks, certification & product names are used for reference only and belong to Amazon.

Topic 1 - Single Topic

Question #1 Topic 1

A data engineer in a manufacturing company is designing a data processing platform that receives a large volume of unstructured data. The data engineer must populate a well-structured star schema in Amazon
Redshift.
What is the most efficient architecture strategy for this purpose?

  • A. Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV data into the analysis schema within Redshift.
  • B. Load the unstructured data into Redshift, and use string parsing functions to extract structured data for inserting into the analysis schema.
  • C. When the data is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform the file contents. Insert the data into the analysis schema on Redshift.
  • D. Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and use AWS Lambda to INSERT the data into Redshift.
Reveal Solution Hide Solution   Discussion   10

Correct Answer: A 🗳️

Question #2 Topic 1

A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage.
Which AWS service strategy is best for this use case?

  • A. Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.
  • B. Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.
  • C. Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.
  • D. Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.
Reveal Solution Hide Solution   Discussion   26

Correct Answer: C 🗳️
Reference: https://aws.amazon.com/blogs/database/indexing-metadata-in-amazon-elasticsearch-service- using-aws-lambda-and-python/

Question #3 Topic 1

A data engineer chooses Amazon DynamoDB as a data store for a regulated application. This application must be submitted to regulators for review. The data engineer needs to provide a control framework that lists the security controls from the process to follow to add new users down to the physical controls of the data center, including items like security guards and cameras.
How should this control mapping be achieved using AWS?

  • A. Request AWS third-party audit reports and/or the AWS quality addendum and map the AWS responsibilities to the controls that must be provided.
  • B. Request data center Temporary Auditor access to an AWS data center to verify the control mapping.
  • C. Request relevant SLAs and security guidelines for Amazon DynamoDB and define these guidelines within the applications architecture to map to the control framework.
  • D. Request Amazon DynamoDB system architecture designs to determine how to map the AWS responsibilities to the control that must be provided.
Reveal Solution Hide Solution   Discussion   7

Correct Answer: A 🗳️

Question #4 Topic 1

An administrator needs to design a distribution strategy for a star schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema.
In which three circumstances would choosing Key-based distribution be most appropriate? (Select three.)

  • A. When the administrator needs to optimize a large, slowly changing dimension table.
  • B. When the administrator needs to reduce cross-node traffic.
  • C. When the administrator needs to optimize the fact table for parity with the number of slices.
  • D. When the administrator needs to balance data distribution and collocation data.
  • E. When the administrator needs to take advantage of data locality on a local node for joins and aggregates.
Reveal Solution Hide Solution   Discussion   20

Correct Answer: ACD 🗳️

Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...