exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 15 discussion

An airline has .csv-formatted data stored in Amazon S3 with an AWS Glue Data Catalog. Data analysts want to join this data with call center data stored in
Amazon Redshift as part of a dally batch process. The Amazon Redshift cluster is already under a heavy load. The solution must be managed, serverless, well- functioning, and minimize the load on the existing Amazon Redshift cluster. The solution should also require minimal effort and development activity.
Which solution meets these requirements?

  • A. Unload the call center data from Amazon Redshift to Amazon S3 using an AWS Lambda function. Perform the join with AWS Glue ETL scripts.
  • B. Export the call center data from Amazon Redshift using a Python shell in AWS Glue. Perform the join with AWS Glue ETL scripts.
  • C. Create an external table using Amazon Redshift Spectrum for the call center data and perform the join with Amazon Redshift.
  • D. Export the call center data from Amazon Redshift to Amazon EMR using Apache Sqoop. Perform the join with Apache Hive.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
abhineet
Highly Voted 3 years, 9 months ago
I would go for C, Spectrum is serverless as well. Ques also asks for minimum development effort. For option A, you need to develop Lambda and Glue code.
upvoted 39 times
Jh2501
3 years, 9 months ago
Agree C. However, one thing I am still confused about - how can Spectrum create external table for the call centre data whereas it doesn't get stored on S3?
upvoted 7 times
Phoenyx89
3 years, 9 months ago
with a Create External Table as Select... I suppose
upvoted 1 times
DerekKey
3 years, 8 months ago
Wrong "Create External Table as Select" is Redshift command not Spectrum. Spectrum is used to query external as Read Only.
upvoted 1 times
...
...
GeeBeeEl
3 years, 9 months ago
If you are confused, check https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html
upvoted 3 times
Manue
3 years, 8 months ago
I think Jh2501 is not challenging Spectrum capability to create external tables, but how/why to create an external table on data which is stored in Redshift and not in S3. So, if there is not a typo in the question, C is a doubtful answer, and A could be the right one.
upvoted 5 times
...
...
JBAWA
2 years, 8 months ago
To define an external table in Amazon Redshift, use the CREATE EXTERNAL TABLE command
upvoted 1 times
...
...
Merrick
2 years, 6 months ago
https://aws.amazon.com/ko/premiumsupport/knowledge-center/redshift-spectrum-external-table/
upvoted 1 times
...
...
jove
Highly Voted 3 years, 8 months ago
C doesn't make sense. Call center data is already stored in Redshift. What would be the purpose of creating an external table for the call center data? Also C suggests to perform the join with Redshift which is already under a heavy load.
upvoted 19 times
mendelthegreat
3 years, 8 months ago
1. Redshift Spectrum is a compute layer that sits between S3 and Redshift, so it will not add more load to Redshift 2. What the case is saying is that because Redshift is already under heavy load, we shouldn't load the .CSV data from S3 into redshift, so an external table would be better in Redshift Spectrum 3. The best use case for Redshift Spectrum, as described in the question, is to JOIN data in Redshift with another external data source, which in this case is S3, without having the need to bring everything into Redshift. C is the undeniable correct answer here
upvoted 28 times
...
...
Kam006
Most Recent 1 year, 4 months ago
C is the correct answer. I do agree that table creation is part of RS and external table created to access the NON RS data sources (e.g. S3). External tables allow you to query data in S3 using the same SELECT syntax as with other Amazon Redshift tables. Here, the question says RS is already overloaded, hence we should not load the data in to RS. RS spectrum will join the S3 data along with RS data which is also serverless and minimal development efforts
upvoted 1 times
...
jerkane
1 year, 7 months ago
Selected Answer: C
C is correct as it is the one with minimal effort as no data is moved.
upvoted 2 times
...
markstudy
1 year, 8 months ago
Selected Answer: B
I would pick B: A: Lambda is limited to 15 minutes of execution time, might not be enough to unload. C: The call center data is already in Redshift, the missing data is the airport data. Best possible option seems to be B: Unload redshift data and develop something to merge/join data, so redshift doesn't want to run the queries and merge.
upvoted 1 times
...
nroopa
1 year, 10 months ago
Selected Answer: A Reason why C is incorrect is as it mentions Creating an external table using Amazon Redshift Spectrum for the call center data ( which is already in Redshift) and performing the join with Amazon Redshift (not sure how to join on redshift as the data is already in Redshift) so i guess this incorrect until it is a type regarding the call center data. So my Option will be A
upvoted 1 times
...
NikkyDicky
1 year, 11 months ago
Selected Answer: C
going with C
upvoted 1 times
...
Hyperdanny
2 years, 1 month ago
I would pick B: A: Lambda is limited to 15 minutes of execution time, might not be enough to unload. C: The call center data is already in Redshift, the missing data is the airport data. Best possible option seems to be B: Unload redshift data and develop something to merge/join data, so redshift doesn't want to run the queries and merge.
upvoted 2 times
...
Cloudbert
2 years, 1 month ago
Selected Answer: A
C is wrong. To use the CREATE External Table command the data has TO BE IN S3. "To define an external table in Amazon Redshift, use the CREATE EXTERNAL TABLE command. The external table statement defines the table columns, the format of your data files, and the location of your data in Amazon S3." The solution also requires to put as little burden on redshift as possible. Lambda and Glue are serverless and by choosing solution 1 we offload the burden from Redshift completely. Solution A must be correct.
upvoted 1 times
...
Debi_mishra
2 years, 2 months ago
I will say none of the answers are exactly correct without additional information such as call centre data structure and volume. D is not correct as its not serverless. C is not correct - you cant create external table when data is in redshift and also it will put load on redshift. A aij ok only if data is small else lambda will timeout Again B can be a problem if data is large as it will put load on redshift.
upvoted 2 times
...
pk349
2 years, 2 months ago
C: I passed the test
upvoted 1 times
...
itsme1
2 years, 4 months ago
Selected Answer: C
"Amazon Redshift Spectrum resides on dedicated Amazon Redshift servers that are independent of your cluster." https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html C: Only caveat being that external table is not created in redshift-spectrum A: UNLOAD is faster as oppose to Lambda, however, it also burdens redshift.
upvoted 1 times
...
[Removed]
2 years, 4 months ago
Selected Answer: C
Use Redshift Spectrum for it!
upvoted 1 times
...
silvaa360
2 years, 6 months ago
Selected Answer: C
I think that there is a typo here, instead of call center data it should be airline data. I had the same question in another paid question dump and the answer was the same as C), but will try to see if there is the same typo in there.
upvoted 2 times
...
nadavw
2 years, 7 months ago
Selected Answer: B
B seems to be a valid approach, taking into consideration that the load shouldn't be on RedShift, The Glue can export the data from RedShift and run (Spark serverless). This blog explains it (ignore the daatbrew which is just a UI above the architecture): https://aws.amazon.com/blogs/big-data/data-preparation-using-amazon-redshift-with-aws-glue-databrew/
upvoted 1 times
...
cloudlearnerhere
2 years, 8 months ago
Selected Answer: C
C is the correct answer A is wrong Although this is a possible solution, it requires a lot of development overhead to build Glue ETL scripts for joining the Redshift and S3 data. A better solution here is to use Amazon Redshift Spectrum
upvoted 2 times
...
rocky48
2 years, 8 months ago
Selected Answer: C
C looks simple enough.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...