Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 15 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 15
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

An airline has .csv-formatted data stored in Amazon S3 with an AWS Glue Data Catalog. Data analysts want to join this data with call center data stored in
Amazon Redshift as part of a dally batch process. The Amazon Redshift cluster is already under a heavy load. The solution must be managed, serverless, well- functioning, and minimize the load on the existing Amazon Redshift cluster. The solution should also require minimal effort and development activity.
Which solution meets these requirements?

A. Unload the call center data from Amazon Redshift to Amazon S3 using an AWS Lambda function. Perform the join with AWS Glue ETL scripts.
B. Export the call center data from Amazon Redshift using a Python shell in AWS Glue. Perform the join with AWS Glue ETL scripts.
C. Create an external table using Amazon Redshift Spectrum for the call center data and perform the join with Amazon Redshift.
D. Export the call center data from Amazon Redshift to Amazon EMR using Apache Sqoop. Perform the join with Apache Hive.

Show Suggested Answer

Suggested Answer: C 🗳️

by singh100 at Aug. 16, 2020, 3:38 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

abhineet

Highly Voted 3 years, 9 months ago

I would go for C, Spectrum is serverless as well. Ques also asks for minimum development effort. For option A, you need to develop Lambda and Glue code.

upvoted 39 times

Jh2501

3 years, 9 months ago

Agree C. However, one thing I am still confused about - how can Spectrum create external table for the call centre data whereas it doesn't get stored on S3?

upvoted 7 times

Phoenyx89

3 years, 9 months ago

with a Create External Table as Select... I suppose

upvoted 1 times

DerekKey

3 years, 8 months ago

Wrong "Create External Table as Select" is Redshift command not Spectrum. Spectrum is used to query external as Read Only.

upvoted 1 times

...

GeeBeeEl

3 years, 9 months ago

If you are confused, check https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html

upvoted 3 times

Manue

3 years, 8 months ago

I think Jh2501 is not challenging Spectrum capability to create external tables, but how/why to create an external table on data which is stored in Redshift and not in S3. So, if there is not a typo in the question, C is a doubtful answer, and A could be the right one.

upvoted 5 times

...

JBAWA

2 years, 8 months ago

To define an external table in Amazon Redshift, use the CREATE EXTERNAL TABLE command

upvoted 1 times

...

Merrick

2 years, 6 months ago

https://aws.amazon.com/ko/premiumsupport/knowledge-center/redshift-spectrum-external-table/

upvoted 1 times

...

jove

Highly Voted 3 years, 8 months ago

C doesn't make sense. Call center data is already stored in Redshift. What would be the purpose of creating an external table for the call center data? Also C suggests to perform the join with Redshift which is already under a heavy load.

upvoted 19 times

mendelthegreat

3 years, 8 months ago

1. Redshift Spectrum is a compute layer that sits between S3 and Redshift, so it will not add more load to Redshift 2. What the case is saying is that because Redshift is already under heavy load, we shouldn't load the .CSV data from S3 into redshift, so an external table would be better in Redshift Spectrum 3. The best use case for Redshift Spectrum, as described in the question, is to JOIN data in Redshift with another external data source, which in this case is S3, without having the need to bring everything into Redshift. C is the undeniable correct answer here

upvoted 28 times

...

Kam006

Most Recent 1 year, 4 months ago

C is the correct answer. I do agree that table creation is part of RS and external table created to access the NON RS data sources (e.g. S3). External tables allow you to query data in S3 using the same SELECT syntax as with other Amazon Redshift tables. Here, the question says RS is already overloaded, hence we should not load the data in to RS. RS spectrum will join the S3 data along with RS data which is also serverless and minimal development efforts

upvoted 1 times

...

jerkane

1 year, 7 months ago

Selected Answer: C

C is correct as it is the one with minimal effort as no data is moved.

upvoted 2 times

...

markstudy

1 year, 8 months ago

Selected Answer: B

I would pick B: A: Lambda is limited to 15 minutes of execution time, might not be enough to unload. C: The call center data is already in Redshift, the missing data is the airport data. Best possible option seems to be B: Unload redshift data and develop something to merge/join data, so redshift doesn't want to run the queries and merge.

upvoted 1 times

...

nroopa

1 year, 10 months ago

Selected Answer: A Reason why C is incorrect is as it mentions Creating an external table using Amazon Redshift Spectrum for the call center data ( which is already in Redshift) and performing the join with Amazon Redshift (not sure how to join on redshift as the data is already in Redshift) so i guess this incorrect until it is a type regarding the call center data. So my Option will be A

upvoted 1 times

...

NikkyDicky

1 year, 11 months ago

Selected Answer: C

going with C

upvoted 1 times

...

Hyperdanny

2 years, 1 month ago

upvoted 2 times

...

Cloudbert

2 years, 1 month ago

Selected Answer: A

C is wrong. To use the CREATE External Table command the data has TO BE IN S3. "To define an external table in Amazon Redshift, use the CREATE EXTERNAL TABLE command. The external table statement defines the table columns, the format of your data files, and the location of your data in Amazon S3." The solution also requires to put as little burden on redshift as possible. Lambda and Glue are serverless and by choosing solution 1 we offload the burden from Redshift completely. Solution A must be correct.

upvoted 1 times

...

Debi_mishra

2 years, 2 months ago

I will say none of the answers are exactly correct without additional information such as call centre data structure and volume. D is not correct as its not serverless. C is not correct - you cant create external table when data is in redshift and also it will put load on redshift. A aij ok only if data is small else lambda will timeout Again B can be a problem if data is large as it will put load on redshift.

upvoted 2 times

...

pk349

2 years, 2 months ago

C: I passed the test

upvoted 1 times

...

itsme1

2 years, 4 months ago

Selected Answer: C

"Amazon Redshift Spectrum resides on dedicated Amazon Redshift servers that are independent of your cluster." https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html C: Only caveat being that external table is not created in redshift-spectrum A: UNLOAD is faster as oppose to Lambda, however, it also burdens redshift.

upvoted 1 times

...

[Removed]

2 years, 4 months ago

Selected Answer: C

Use Redshift Spectrum for it!

upvoted 1 times

...

silvaa360

2 years, 6 months ago

Selected Answer: C

I think that there is a typo here, instead of call center data it should be airline data. I had the same question in another paid question dump and the answer was the same as C), but will try to see if there is the same typo in there.

upvoted 2 times

...

nadavw

2 years, 7 months ago

Selected Answer: B

B seems to be a valid approach, taking into consideration that the load shouldn't be on RedShift, The Glue can export the data from RedShift and run (Spark serverless). This blog explains it (ignore the daatbrew which is just a UI above the architecture): https://aws.amazon.com/blogs/big-data/data-preparation-using-amazon-redshift-with-aws-glue-databrew/

upvoted 1 times

...

cloudlearnerhere

2 years, 8 months ago

Selected Answer: C

C is the correct answer A is wrong Although this is a possible solution, it requires a lot of development overhead to build Glue ETL scripts for joining the Redshift and S3 data. A better solution here is to use Amazon Redshift Spectrum

upvoted 2 times

...

rocky48

2 years, 8 months ago

Selected Answer: C

C looks simple enough.

upvoted 1 times

...

Load full discussion...