exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 205 discussion

A data engineer uses Amazon Kinesis Data Streams to ingest and process records that contain user behavior data from an application every day.

The data engineer notices that the data stream is experiencing throttling because hot shards receive much more data than other shards in the data stream.

How should the data engineer resolve the throttling issue?

  • A. Use a random partition key to distribute the ingested records.
  • B. Increase the number of shards in the data stream. Distribute the records across the shards.
  • C. Limit the number of records that are sent each second by the producer to match the capacity of the stream.
  • D. Decrease the size of the records that the producer sends to match the capacity of the stream.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Tani0908
1 week, 4 days ago
Selected Answer: B
In case of user behavior data the order of the data/events needs to be maintained so we cannot use random partition key as it would scatter data. So B is correct for this scenario
upvoted 1 times
...
Mitchdu
3 weeks, 1 day ago
Selected Answer: A
Option A: Use random partition key to distribute records** - ✅ **Addresses root cause**: Random keys ensure even distribution across all shards - ✅ **Immediate solution**: No infrastructure changes needed - ✅ **Cost-effective**: Uses existing shard capacity efficiently - ✅ **Eliminates hot shards**: Random distribution prevents any single shard from being overloaded **Option B: Increase number of shards + distribute records** - ⚠️ **Partial solution**: More shards provide more capacity - ❌ **Doesn't fix root cause**: Poor partition key selection will still create hot shards - ❌ **Higher cost**: More shards = higher operational costs - ❌ **Temporary fix**: Hot shard problem will likely persist
upvoted 3 times
...
zits88
3 weeks, 6 days ago
Selected Answer: B
After reading about this EXTENSIVELY (and not just asking ChatGPT like some of these folks), the actual correct answer is B here. Option A WOULD indeed resolve the hot shards, but all order and logic to what data goes to which shard would be lost, which is not good in this scenario with user behavior in an application. As someone below said, it's true that just "increasing number of shards" would NOT rectify the problem. However, the key second sentence about "distributing" the data across the shards completes the tasks asked of us in the question. It doesn't specify how we'd do that, but it doesn't matter. Now, this doesn't guarantee that AWS themselves considers A the actual correct answer -- in all the real-life exams I have taken, I have seen at least one or two questions with NO CORRECT ANSWERS, and many more that seem almost AI-generated. But in real life data engineering, you don't want to randomize the partition key here.
upvoted 1 times
...
siheom
1 month ago
Selected Answer: B
VOTE B
upvoted 2 times
...
Faye15599
3 months, 3 weeks ago
Selected Answer: A
A is the best solution because the issue of hot shards is typically caused by an uneven distribution of records across shards due to poorly chosen partition keys. Using a random partition key ensures that records are distributed more evenly across all shards, reducing the likelihood of any single shard becoming "hot" and experiencing throttling. B is incorrect because while increasing the number of shards can help handle more data, it does not resolve the root cause of hot shards, which is uneven distribution due to poor partition key selection. Without addressing the partition key issue, adding shards may still result in some shards being overloaded.
upvoted 4 times
zits88
3 weeks, 6 days ago
This is incorrect. You do not want to randomize the partition key here as it would lose all logic for ordering in the shards. And "your" (ChatGPT's) explanation of why B is wrong is, well, wrong. The second sentence of the answer option says "Distribute the records across the shards" takes care of all the issues.
upvoted 1 times
...
...
JekChong
4 months, 2 weeks ago
Selected Answer: B
Amazon Kinesis Data Streams uses shards to distribute data, and each shard has a fixed throughput limit. If certain shards receive significantly more data than others (hot shards), they will experience throttling. To resolve this issue: Increase the number of shards – This increases the overall capacity of the stream. Distribute records more evenly across shards – This can be done by modifying the partition key strategy so that data is spread more evenly.
upvoted 4 times
...
italiancloud2025
4 months, 2 weeks ago
Selected Answer: A
A: Sí, usar una clave de partición aleatoria distribuirá uniformemente los registros entre los shards, reduciendo cuellos de botella en shards "calientes". B: No, aumentar shards no soluciona la desproporción si la clave sigue causando concentración.
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...