Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 64 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 64
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A large financial company is running its ETL process. Part of this process is to move data from Amazon S3 into an Amazon Redshift cluster. The company wants to use the most cost-efficient method to load the dataset into Amazon Redshift.
Which combination of steps would meet these requirements? (Choose two.)

A. Use the COPY command with the manifest file to load data into Amazon Redshift.
B. Use S3DistCp to load files into Amazon Redshift.
C. Use temporary staging tables during the loading process.
D. Use the UNLOAD command to upload data into Amazon Redshift.
E. Use Amazon Redshift Spectrum to query files from Amazon S3.

Show Suggested Answer

Suggested Answer: AC 🗳️

by Priyanka_01 at Aug. 14, 2020, 10:19 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Priyanka_01

Highly Voted 3 years, 9 months ago

A & C Copy command and loading into temp staging tables

upvoted 31 times

...

carol1522

Highly Voted 3 years, 9 months ago

A and c, because the goal is move data from s3 to redshift, and in the E we are not moving.

upvoted 14 times

...

Debi_mishra

Most Recent 2 years, 1 month ago

A & C. But If you are going to appear exam in near future - redshift auto copy is now a new no-ETL feature and may replace these options.

upvoted 2 times

...

pk349

2 years, 1 month ago

AC: I passed the test

upvoted 1 times

...

cloudlearnerhere

2 years, 7 months ago

Selected Answer: AC

Correct answers are A & C. Option B is wrong as S3DistCp is used to copy data between S3 and HDFS. Option D is wrong as UNLOAD helps unloading the data from Redshift to S3. Option E is wrong as Redshift Spectrum does not load the data into Redshift, but the requirement is to load.

upvoted 8 times

cloudlearnerhere

2 years, 7 months ago

Option A as the COPY command loads data in parallel from Amazon S3, Amazon EMR, Amazon DynamoDB, or multiple data sources on remote hosts. COPY loads large amounts of data much more efficiently than using INSERT statements, and stores the data more effectively as well. Amazon S3 provides eventual consistency for some operations. Thus, it's possible that new data won't be available immediately after the upload, which can result in an incomplete data load or loading stale data. You can manage data consistency by using a manifest file to load data Option C as you can efficiently update and insert new data by loading your data into a staging table first. Amazon Redshift doesn't support a single merge statement (update or insert, also known as an upsert) to insert and update data from a single data source. However, you can effectively perform a merge operation. To do so, load your data into a staging table and then join the staging table with your target table for an UPDATE statement and an INSERT statement.

upvoted 4 times

...

dushmantha

2 years, 10 months ago

Selected Answer: AC

B is not correct because its used with EMR. D is not correct because UNLOAD is used to put data from Redshift to S3. C seems to be involve lot of work, but E does not allow to move data to Redshift but the organization requires that and A is anyway correct. So I would go with A nd C

upvoted 1 times

...

rocky48

2 years, 11 months ago

Selected Answer: AC

A, C are correct

upvoted 1 times

...

Bik000

3 years, 1 month ago

Selected Answer: AC

Answer is A & C

upvoted 1 times

...

jrheen

3 years, 2 months ago

Answer - A,C

upvoted 1 times

...

aws2019

3 years, 7 months ago

A and C

upvoted 1 times

...

gunjan4392

3 years, 7 months ago

A, C are correct

upvoted 1 times

...

lostsoul07

3 years, 8 months ago

A,C is the right answer

upvoted 2 times

...

Subho_in

3 years, 8 months ago

https://aws.amazon.com/blogs/big-data/top-8-best-practices-for-high-performance-etl-processing-using-amazon-redshift/ Point number 1 and 2. Option A and C must be the answer

upvoted 10 times

Ramshizzle

3 years ago

Point 5 is also important to note in the article mentioned by Subho_in. Also look at this why to use Staging tables: https://docs.aws.amazon.com/redshift/latest/dg/merge-create-staging-table.html

upvoted 1 times

...