Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Amazon AWS Certified Data Analytics - Specialty Exam Practice Questions

The questions for AWS Certified Data Analytics - Specialty were last updated at June 13, 2022.
  • Viewing page 1 out of 41 pages.
  • Viewing questions 1-4 out of 172 questions
Disclaimers:
  • - ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
  • - Trademarks, certification & product names are used for reference only and belong to Amazon.
Question #1 Topic 1

Amazon Athena is used by a business to do ad-hoc searches on data stored in Amazon S3. To comply with internal security regulations, the organization wishes to incorporate additional restrictions to isolate query execution and history among individuals, teams, or apps operating in the same AWS account.

Which solution satisfies these criteria?

  • A. Create an S3 bucket for each given use case, create an S3 bucket policy that grants permissions to appropriate individual IAM users. and apply the S3 bucket policy to the S3 bucket.
  • B. Create an Athena workgroup for each given use case, apply tags to the workgroup, and create an IAM policy using the tags to apply appropriate permissions to the workgroup.
  • C. Create an IAM role for each given use case, assign appropriate permissions to the role for the given use case, and add the role to associate the role with Athena.
  • D. Create an AWS Glue Data Catalog resource policy for each given use case that grants permissions to appropriate individual IAM users, and apply the resource policy to the specific tables used by Athena.
Reveal Solution Hide Solution   Discussion   23

Correct Answer: C 🗳️
Reference:
https://aws.amazon.com/athena/faqs/

Question #2 Topic 1

A real estate business uses Apache HBase on Amazon EMR to power a mission-critical application. A single master node is setup for Amazon EMR. The company's data is kept on a Hadoop Distributed File System in excess of 5 TB (HDFS). The organization is looking for a cost-effective way to increase the availability of its HBase data.

Which architectural design best fulfills the needs of the business?

  • A. Use Spot Instances for core and task nodes and a Reserved Instance for the EMR master node. Configure the EMR cluster with multiple master nodes. Schedule automated snapshots using Amazon EventBridge.
  • B. Store the data on an EMR File System (EMRFS) instead of HDFS. Enable EMRFS consistent view. Create an EMR HBase cluster with multiple master nodes. Point the HBase root directory to an Amazon S3 bucket.
  • C. Store the data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent view. Run two separate EMR clusters in two different Availability Zones. Point both clusters to the same HBase root directory in the same Amazon S3 bucket.
  • D. Store the data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent view. Create a primary EMR HBase cluster with multiple master nodes. Create a secondary EMR HBase read-replica cluster in a separate Availability Zone. Point both clusters to the same HBase root directory in the same Amazon S3 bucket.
Reveal Solution Hide Solution   Discussion   9

Correct Answer: C 🗳️
Reference:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html

Question #3 Topic 1

An Internet of Things business is developing a new gadget that will gather data on sleep patterns while sleeping on an intelligent mattress. Sensors will transmit data to an Amazon S3 bucket. Each night, around 2 MB of data is created for each bed. Each user's data must be analyzed and summarized, and the findings must be made accessible as quickly as feasible. Time windowing and other operations are included as part of the procedure. Each run, based on testing using a Python script, will need around 1 GB of RAM and will take a few minutes to finish.

Which option is the MOST cost-effective approach to execute the script?

  • A. AWS Lambda with a Python script
  • B. AWS Glue with a Scala job
  • C. Amazon EMR with an Apache Spark script
  • D. AWS Glue with a PySpark job
Reveal Solution Hide Solution   Discussion   32

Correct Answer: A 🗳️

Question #4 Topic 1

A human resources organization runs analytics queries on the company's data using a 10-node Amazon Redshift cluster. The Amazon Redshift cluster comprises two tables: one for products and one for transactions, both of which have a product sku field. The tables span more than 100 GB. Both tables are used in the majority of queries.

Which distribution pattern should the organization adopt to optimize query speed for the two tables?

  • A. An EVEN distribution style for both tables
  • B. A KEY distribution style for both tables
  • C. An ALL distribution style for the product table and an EVEN distribution style for the transactions table
  • D. An EVEN distribution style for the product table and an KEY distribution style for the transactions table
Reveal Solution Hide Solution   Discussion   19

Correct Answer: B 🗳️

Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...