Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 35 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 35
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A financial company uses Amazon S3 as its data lake and has set up a data warehouse using a multi-node Amazon Redshift cluster. The data files in the data lake are organized in folders based on the data source of each data file. All the data files are loaded to one table in the Amazon Redshift cluster using a separate
COPY command for each data file location. With this approach, loading all the data files into Amazon Redshift takes a long time to complete. Users want a faster solution with little or no increase in cost while maintaining the segregation of the data files in the S3 data lake.
Which solution meets these requirements?

A. Use Amazon EMR to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
B. Load all the data files in parallel to Amazon Aurora, and run an AWS Glue job to load the data into Amazon Redshift.
C. Use an AWS Glue job to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
D. Create a manifest file that contains the data file locations and issue a COPY command to load the data into Amazon Redshift.

Show Suggested Answer

Suggested Answer: D 🗳️

by carol1522 at Aug. 19, 2020, 8:24 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

carol1522

Highly Voted 3 years, 9 months ago

D? https://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html

upvoted 25 times

...

cloudlearnerhere

Highly Voted 2 years, 8 months ago

Selected Answer: D

Correct answer is D as a manifest file can be used to load the data. Also, its recommended to have a single COPY command instead of multiple concurrent COPY commands for performance. Use the COPY command to load a table in parallel from data files on Amazon S3. You can specify the files to be loaded by using an Amazon S3 object prefix or by using a manifest file. Amazon Redshift can automatically load in parallel from multiple compressed data files. However, if you use multiple concurrent COPY commands to load one table from multiple files, Amazon Redshift is forced to perform a serialized load. This type of load is much slower and requires a VACUUM process at the end if the table has a sort column defined. Options A, B & C are wrong as they add unnecessary work and cost.

upvoted 9 times

crs1234

2 years, 2 months ago

Can you share a link that gives more insight into "However, if you use multiple concurrent COPY commands to load one table from multiple files, Amazon Redshift is forced to perform a serialized load. This type of load is much slower and requires a VACUUM process at the end if the table has a sort column defined"?

upvoted 1 times

...

kondi2309

Most Recent 1 year, 5 months ago

Selected Answer: D

Ans D, single COPY command for performance and manifest file for loading data

upvoted 1 times

...

GCPereira

1 year, 6 months ago

datalake s3 -> dw redshift problems copying files using copy-by-prefix need a solution without increasing costs A) emr is expansive b) aurora needs effort configuration and glue needs development effort c) glue job needs a development effort and copying all files to the same prefix will create a problem... which file goes to which table? d) manifest file is the best option because you can specify exactly the prefix/key to your copy command

upvoted 1 times

...

tsk9921

2 years ago

D: manifest file is valid option

upvoted 1 times

...

pk349

2 years, 2 months ago

D: I passed the test

upvoted 2 times

...

Arka_01

2 years, 9 months ago

Selected Answer: D

You can use a single copy command with manifest file, containing different S3 locations. This will speed up the COPY process.

upvoted 1 times

...

rocky48

2 years, 11 months ago

Selected Answer: D

upvoted 1 times

...

Bik000

3 years, 1 month ago

Selected Answer: D

My Answer is D

upvoted 1 times

...

AWSRanger

3 years, 2 months ago

Selected Answer: D

D is correct

upvoted 2 times

...

Shraddha

3 years, 8 months ago

Ans D A = wrong, no segregation, increased cost. B = wrong, no segregation, unnecessary work, increased cost. C = wrong, no segregation, increased cost. This is a question on how COPY command work. In general you should use only one COPY command because Redshift will load data in parallel, if you use many COPYs Redshift will have to load data in sequential manner. https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#copy-command-examples-manifest

upvoted 6 times

...

lostsoul07

3 years, 8 months ago

D is the right answer

upvoted 2 times

...

BillyC

3 years, 8 months ago

My answer is D

upvoted 2 times

...

sanjaym

3 years, 8 months ago

D is right answer.

upvoted 2 times

...

syu31svc

3 years, 9 months ago

From the link:https://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html "You can use a manifest to ensure that the COPY command loads all of the required files, and only the required files, for a data load" So answer is D

upvoted 7 times

...

Paitan

3 years, 9 months ago

Using manifest file is the right choice. So option D.

upvoted 3 times

...

Saaho

3 years, 9 months ago

Yes D is the right answer

upvoted 4 times

...

Load full discussion...