Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 46 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 46
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A retail company is building its data warehouse solution using Amazon Redshift. As a part of that effort, the company is loading hundreds of files into the fact table created in its Amazon Redshift cluster. The company wants the solution to achieve the highest throughput and optimally use cluster resources when loading data into the company's fact table.
How should the company meet these requirements?

A. Use multiple COPY commands to load the data into the Amazon Redshift cluster.
B. Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS) and use an HDFS connector to ingest the data into the Amazon Redshift cluster.
C. Use LOAD commands equal to the number of Amazon Redshift cluster nodes and load the data in parallel into each node.
D. Use a single COPY command to load the data into the Amazon Redshift cluster.

Show Suggested Answer

Suggested Answer: D 🗳️

by Priyanka_01 at Aug. 11, 2020, 12:16 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Priyanka_01

Highly Voted 3 years, 11 months ago

D. https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html

upvoted 35 times

...

cloudlearnerhere

Highly Voted 2 years, 9 months ago

Selected Answer: D

Correct answer is D as using a single COPY command would load the data in parallel. Amazon Redshift can automatically load in parallel from multiple compressed data files. However, if you use multiple concurrent COPY commands to load one table from multiple files, Amazon Redshift is forced to perform a serialized load. This type of load is much slower and requires a VACUUM process at the end if the table has a sort column defined. Option A is wrong as multiple COPY commands would force Redshift to perform a serialized load. Option B is wrong as using EMR just makes the solution complicated. Option C is wrong as there is no LOAD command with Redshift.

upvoted 9 times

...

gofavad926

Most Recent 1 year, 10 months ago

Selected Answer: D

D. https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html

upvoted 1 times

...

pk349

2 years, 3 months ago

D: I passed the test

upvoted 2 times

...

anjuvinayan

2 years, 4 months ago

Answer is D Copy Command is used to load data to redshift. Already single copy command load data in parallel.

upvoted 2 times

...

Arka_01

2 years, 10 months ago

Selected Answer: D

Single copy command is the correct answer.

upvoted 1 times

...

fqc

3 years ago

Selected Answer: D

The copy command is by default parallelized and effcient. It uses to load data from sources other than RedShift. If it is RedShift then use INSERT INTO or CREATE TABLE AS commans like in SQL.

upvoted 2 times

...

fqc

3 years ago

Selected Answer: B

The copy command is by default parallelized and effcient. It uses to load data from sources other than RedShift. If it is RedShift then use INSERT INTO or CREATE TABLE AS commans like in SQL.

upvoted 2 times

...

rocky48

3 years, 1 month ago

Selected Answer: D

The copy command is by default parallelized and efficient. It uses to load data from sources other than RedShift. If it is RedShift then use INSERT INTO or CREATE TABLE AS commas like in SQL. No point of creating expensive solution as given in B. The answer should be D

upvoted 1 times

...

dushmantha

3 years, 2 months ago

The copy command is by default parallelized and effcient. It uses to load data from sources other than RedShift. If it is RedShift then use INSERT INTO or CREATE TABLE AS commans like in SQL. No point of creating expensive solution as given in B. The answer should be D

upvoted 1 times

...

Bik000

3 years, 3 months ago

Selected Answer: D

Answer is D

upvoted 1 times

...

certificationJunkie

3 years, 3 months ago

It's D. The only requirement is that all the files should lie under a common directory and in the redshift copy command you need to pass the path till directory so that it will consider all the files inside it.

upvoted 1 times

...