Exam AWS Certified Big Data - Specialty All Questions

View all questions & answers for the AWS Certified Big Data - Specialty exam

Exam AWS Certified Big Data - Specialty topic 2 question 12 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty

Question #: 12
Topic #: 2

[All AWS Certified Big Data - Specialty Questions]

An advertising organization uses an application to process a stream of events that are received from clients in multiple unstructured formats.
The application does the following:
✑ Transforms the events into a single structured format and streams them to Amazon Kinesis for real-time analysis.
✑ Stores the unstructured raw events from the log files on local hard drivers that are rotated and uploaded to Amazon S3.
The organization wants to extract campaign performance reporting using an existing Amazon redshift cluster.
Which solution will provide the performance data with the LEAST number of operations?

A. Install the Amazon Kinesis Data Firehose agent on the application servers and use it to stream the log files directly to Amazon Redshift.
B. Create an external table in Amazon Redshift and point it to the S3 bucket where the unstructured raw events are stored.
C. Write an AWS Lambda function that triggers every hour to load the new log files already in S3 to Amazon redshift.
D. Connect Amazon Kinesis Data Firehose to the existing Amazon Kinesis stream and use it to stream the event directly to Amazon Redshift.

Show Suggested Answer

Suggested Answer: B 🗳️

by mattyb123 at Aug. 26, 2019, 6:05 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Bulti

Highly Voted 3 years, 9 months ago

Not A – No use loading unstructured data in multiple formats to RedShift via Kinesis Firehouse agent. Not B- Creating External table using RedShift Spectrum will be an issue against unstructured data in multiple formats. Not C - Not a good choice. Never seen Lambda talking to RedShift and why would you use it when KFH directly connect to RedShift. Correct Option is D- Because it loads structured data in a single format to RedShift.

upvoted 6 times

...

DerekKey

Most Recent 3 years, 8 months ago

Correct D: least number of operations

upvoted 1 times

...

jove

3 years, 8 months ago

D seems more reasonable. I go with D

upvoted 2 times

...

Bulti

3 years, 9 months ago

Option A also talks about shipping structured data from the application using Kinesis Firehouse agent to Redshift. So using the same Agent it's possible to stream the data to both Kinesis Data Streams for analyiss and KFH to deliver it to Redshift. It seems like the most direct option.

upvoted 1 times

jove

3 years, 8 months ago

A is about log files which are unstructured. Not a good idea to move the log files to Redshift.

upvoted 1 times

...

susan8840

3 years, 9 months ago

B. the question is asking how to consume the data in Redshift not how to get/input the data which is already in place

upvoted 1 times

...

Zinty

3 years, 9 months ago

The unstructured date is already transformed to single dtructured format prior to putting into Kinesis. So I will go with D for LEAST number of operations. B = spectrum is not needed

upvoted 1 times

...

zhengtoronto

3 years, 9 months ago

To handle the unstructured data structure, Kinesis Data Firehose can invoke Lambda function to do data transformation and format conversion, so it's D

upvoted 2 times

...

san2020

3 years, 9 months ago

my selection D

upvoted 2 times

...

mars2

3 years, 9 months ago

answer is D. The key here is multiple unstructured formats. You can't define an external table with multiple source formats.

upvoted 3 times

...

Kuntazulu

3 years, 9 months ago

A. FH to Redshift is direct...

upvoted 1 times

...

sriansri

3 years, 9 months ago

For unstructured data combine Redshift with S3 is basic. Because Redshift is not for unstructured data.

upvoted 4 times

DerekKey

3 years, 8 months ago

Transforms the events into a single structured format and streams them to Amazon Kinesis for real-time analysis.

upvoted 1 times

...

cybe001

3 years, 9 months ago

I go with D. Fire Hose can read the Structured data from Kinesis Stream and store it in Redshift.

upvoted 2 times

...

Zire

3 years, 9 months ago

The problem with B is fine if the data was structured since we could use redshidt spectrum to create external tables pointing to S3 . For this I'd go with D. At least as solution it is correct

upvoted 1 times

...

bigdatalearner

3 years, 9 months ago

B is the right answer

upvoted 2 times

d00ku

3 years, 9 months ago

How can B be the answer when is says 'point the table to the unstructured data'? The answer is D.

upvoted 3 times

shwang

3 years, 9 months ago

refereed FAQ, unstructured data in s3 could be the external table of redshift, So it is B

upvoted 1 times

DerekKey

3 years, 8 months ago

Transforms the events into a single structured format and streams them to Amazon Kinesis for real-time analysis.

upvoted 1 times

...

mattyb123

3 years, 9 months ago

Thoughts on D?

upvoted 4 times

mattyb123

3 years, 9 months ago

Amazon Redshift Spectrum uses external tables to query data that is stored in Amazon S3. You can query an external table using the same SELECT syntax you use with other Amazon Redshift tables. External tables are read-only. You can't write to an external table. 1.https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html 2.https://blog.openbridge.com/10-simple-tips-that-help-you-quickly-find-success-adopting-amazon-redshift-spectrum-810db089abbe 3.https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html 4.https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html

upvoted 2 times

jove

3 years, 9 months ago

Amazon Redshift now supports writing to external tables in Amazon S3 : https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-redshift-now-supports-writing-to-external-tables-in-amazon-s3/

upvoted 1 times

...

mattyb123

3 years, 9 months ago

I think its D due to FH being able to automatically copy/write the data to redshift. Where if you were using redshift spectrum you can only create read only external tables and you would need to write the SQL to create the external table.

upvoted 4 times

...