Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 205 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 205
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A data engineer uses Amazon Kinesis Data Streams to ingest and process records that contain user behavior data from an application every day.

The data engineer notices that the data stream is experiencing throttling because hot shards receive much more data than other shards in the data stream.

How should the data engineer resolve the throttling issue?

A. Use a random partition key to distribute the ingested records.
B. Increase the number of shards in the data stream. Distribute the records across the shards.
C. Limit the number of records that are sent each second by the producer to match the capacity of the stream.
D. Decrease the size of the records that the producer sends to match the capacity of the stream.

Show Suggested Answer

Suggested Answer: A 🗳️

by italiancloud2025 at Feb. 18, 2025, 11:33 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Tani0908

1 week, 4 days ago

Selected Answer: B

In case of user behavior data the order of the data/events needs to be maintained so we cannot use random partition key as it would scatter data. So B is correct for this scenario

upvoted 1 times

...

Mitchdu

3 weeks, 1 day ago

Selected Answer: A

Option A: Use random partition key to distribute records** - ✅ **Addresses root cause**: Random keys ensure even distribution across all shards - ✅ **Immediate solution**: No infrastructure changes needed - ✅ **Cost-effective**: Uses existing shard capacity efficiently - ✅ **Eliminates hot shards**: Random distribution prevents any single shard from being overloaded **Option B: Increase number of shards + distribute records** - ⚠️ **Partial solution**: More shards provide more capacity - ❌ **Doesn't fix root cause**: Poor partition key selection will still create hot shards - ❌ **Higher cost**: More shards = higher operational costs - ❌ **Temporary fix**: Hot shard problem will likely persist

upvoted 3 times

...

zits88

3 weeks, 6 days ago

Selected Answer: B

After reading about this EXTENSIVELY (and not just asking ChatGPT like some of these folks), the actual correct answer is B here. Option A WOULD indeed resolve the hot shards, but all order and logic to what data goes to which shard would be lost, which is not good in this scenario with user behavior in an application. As someone below said, it's true that just "increasing number of shards" would NOT rectify the problem. However, the key second sentence about "distributing" the data across the shards completes the tasks asked of us in the question. It doesn't specify how we'd do that, but it doesn't matter. Now, this doesn't guarantee that AWS themselves considers A the actual correct answer -- in all the real-life exams I have taken, I have seen at least one or two questions with NO CORRECT ANSWERS, and many more that seem almost AI-generated. But in real life data engineering, you don't want to randomize the partition key here.

upvoted 1 times

...

siheom

1 month ago

Selected Answer: B

VOTE B

upvoted 2 times

...

Faye15599

3 months, 3 weeks ago

Selected Answer: A

A is the best solution because the issue of hot shards is typically caused by an uneven distribution of records across shards due to poorly chosen partition keys. Using a random partition key ensures that records are distributed more evenly across all shards, reducing the likelihood of any single shard becoming "hot" and experiencing throttling. B is incorrect because while increasing the number of shards can help handle more data, it does not resolve the root cause of hot shards, which is uneven distribution due to poor partition key selection. Without addressing the partition key issue, adding shards may still result in some shards being overloaded.

upvoted 4 times

zits88

3 weeks, 6 days ago

This is incorrect. You do not want to randomize the partition key here as it would lose all logic for ordering in the shards. And "your" (ChatGPT's) explanation of why B is wrong is, well, wrong. The second sentence of the answer option says "Distribute the records across the shards" takes care of all the issues.

upvoted 1 times

...

JekChong

4 months, 2 weeks ago

Selected Answer: B

Amazon Kinesis Data Streams uses shards to distribute data, and each shard has a fixed throughput limit. If certain shards receive significantly more data than others (hot shards), they will experience throttling. To resolve this issue: Increase the number of shards – This increases the overall capacity of the stream. Distribute records more evenly across shards – This can be done by modifying the partition key strategy so that data is spread more evenly.

upvoted 4 times

...

italiancloud2025

4 months, 2 weeks ago

Selected Answer: A

A: Sí, usar una clave de partición aleatoria distribuirá uniformemente los registros entre los shards, reduciendo cuellos de botella en shards "calientes". B: No, aumentar shards no soluciona la desproporción si la clave sigue causando concentración.

upvoted 3 times

...