Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 55 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 55
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A company uses an Amazon Redshift provisioned cluster as its database. The Redshift cluster has five reserved ra3.4xlarge nodes and uses key distribution.
A data engineer notices that one of the nodes frequently has a CPU load over 90%. SQL Queries that run on the node are queued. The other four nodes usually have a CPU load under 15% during daily operations.
The data engineer wants to maintain the current number of compute nodes. The data engineer also wants to balance the load more evenly across all five compute nodes.
Which solution will meet these requirements?

A. Change the sort key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.
B. Change the distribution key to the table column that has the largest dimension.
C. Upgrade the reserved node from ra3.4xlarge to ra3.16xlarge.
D. Change the primary key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.

Show Suggested Answer

Suggested Answer: B 🗳️

by rralucard_ at Feb. 2, 2024, 11:07 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rralucard_

Highly Voted 11 months ago

Selected Answer: B

https://docs.aws.amazon.com/redshift/latest/dg/t_Distributing_data.html Option B, changing the distribution key, is the most effective solution to balance the load more evenly across all five compute nodes. Selecting an appropriate distribution key that aligns with the query patterns and data characteristics can result in a more uniform distribution of data and workloads, thus reducing the likelihood of one node being overutilized while others are underutilized.

upvoted 7 times

...

pypelyncar

Most Recent 6 months, 3 weeks ago

Selected Answer: B

In a Redshift cluster with key distribution, data is distributed across compute nodes based on the values of the distribution key. An uneven distribution can lead to skewed workloads on specific nodes. By choosing the table column with the largest dimension (most distinct values) as the distribution key, you ensure a more even spread of data across all nodes. This balances the processing load on each node when queries access that column.

upvoted 2 times

...

khchan123

8 months ago

Selected Answer: B

The correct solution is B. Change the distribution key to the table column that has the largest dimension. This will help to distribute the data more evenly across the nodes, reducing the load on the heavily utilized node.

upvoted 2 times

...

Christina666

8 months, 2 weeks ago

Selected Answer: A

Gemini result: Understanding the Problem: The scenario describes a Redshift cluster with uneven load distribution. This indicates potential issues with either the distribution style or the sort key. Key Distribution: The problem states that the cluster uses key distribution, meaning a specific column is designated as the distribution key. Data rows with matching distribution key values are placed on the same node. Sort Key: A sort key determines the order in which data is physically stored within a table's blocks on a node. A well-chosen sort key can significantly optimize query performance, especially when queries often filter by that column.

upvoted 1 times

tgv

7 months ago

The sort key determines the order of data storage and can improve query performance for specific queries, but it does not directly affect the distribution of data across nodes. Therefore, this will not address the uneven CPU load issue.

upvoted 1 times

...

damaldon

9 months, 3 weeks ago

B. With "Key distribution". The rows are distributed according to the values in one column. The leader node places matching values on the same node slice. If you distribute a pair of tables on the joining keys, the leader node collocates the rows on the slices according to the values in the joining columns. This way, matching values from the common columns are physically stored together. https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html

upvoted 2 times

...