exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 55 discussion

A company uses an Amazon Redshift provisioned cluster as its database. The Redshift cluster has five reserved ra3.4xlarge nodes and uses key distribution.
A data engineer notices that one of the nodes frequently has a CPU load over 90%. SQL Queries that run on the node are queued. The other four nodes usually have a CPU load under 15% during daily operations.
The data engineer wants to maintain the current number of compute nodes. The data engineer also wants to balance the load more evenly across all five compute nodes.
Which solution will meet these requirements?

  • A. Change the sort key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.
  • B. Change the distribution key to the table column that has the largest dimension.
  • C. Upgrade the reserved node from ra3.4xlarge to ra3.16xlarge.
  • D. Change the primary key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rralucard_
Highly Voted 9 months, 2 weeks ago
Selected Answer: B
https://docs.aws.amazon.com/redshift/latest/dg/t_Distributing_data.html Option B, changing the distribution key, is the most effective solution to balance the load more evenly across all five compute nodes. Selecting an appropriate distribution key that aligns with the query patterns and data characteristics can result in a more uniform distribution of data and workloads, thus reducing the likelihood of one node being overutilized while others are underutilized.
upvoted 7 times
...
pypelyncar
Most Recent 5 months ago
Selected Answer: B
In a Redshift cluster with key distribution, data is distributed across compute nodes based on the values of the distribution key. An uneven distribution can lead to skewed workloads on specific nodes. By choosing the table column with the largest dimension (most distinct values) as the distribution key, you ensure a more even spread of data across all nodes. This balances the processing load on each node when queries access that column.
upvoted 2 times
...
khchan123
6 months, 2 weeks ago
Selected Answer: B
The correct solution is B. Change the distribution key to the table column that has the largest dimension. This will help to distribute the data more evenly across the nodes, reducing the load on the heavily utilized node.
upvoted 2 times
...
Christina666
7 months ago
Selected Answer: A
Gemini result: Understanding the Problem: The scenario describes a Redshift cluster with uneven load distribution. This indicates potential issues with either the distribution style or the sort key. Key Distribution: The problem states that the cluster uses key distribution, meaning a specific column is designated as the distribution key. Data rows with matching distribution key values are placed on the same node. Sort Key: A sort key determines the order in which data is physically stored within a table's blocks on a node. A well-chosen sort key can significantly optimize query performance, especially when queries often filter by that column.
upvoted 1 times
tgv
5 months, 2 weeks ago
The sort key determines the order of data storage and can improve query performance for specific queries, but it does not directly affect the distribution of data across nodes. Therefore, this will not address the uneven CPU load issue.
upvoted 1 times
...
...
damaldon
8 months, 1 week ago
B. With "Key distribution". The rows are distributed according to the values in one column. The leader node places matching values on the same node slice. If you distribute a pair of tables on the joining keys, the leader node collocates the rows on the slices according to the values in the joining columns. This way, matching values from the common columns are physically stored together. https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago