Exam AWS Certified Data Analytics - Specialty topic 1 question 85 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 85
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

An online retailer is rebuilding its inventory management system and inventory reordering system to automatically reorder products by using Amazon Kinesis Data
Streams. The inventory management system uses the Kinesis Producer Library (KPL) to publish data to a stream. The inventory reordering system uses the
Kinesis Client Library (KCL) to consume data from the stream. The stream has been configured to scale as needed. Just before production deployment, the retailer discovers that the inventory reordering system is receiving duplicated data.
Which factors could be causing the duplicated data? (Choose two.)

A. The producer has a network-related timeout.
B. The stream's value for the IteratorAgeMilliseconds metric is too high.
C. There was a change in the number of shards, record processors, or both.
D. The AggregationEnabled configuration property was set to true.
E. The max_records configuration property was set to a number that is too high.

Show Suggested Answer

Suggested Answer: AC 🗳️

by bermo at May 1, 2021, 2:16 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

VikG12

Highly Voted 3 years, 7 months ago

https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html A,C

upvoted 32 times

...

dushmantha

Highly Voted 2 years, 11 months ago

Duplication can happen in two ways, either in producer side or consumer side, obviously. In proudcer side it happens due to network delays/timeouts, specifically producer is waiting for an successful acknowledgement, yet it is lost due to network failure, and producer sends data again until it receives acknowledgement. In consumer side it happens due to record processor restart. This can happen due to 4 reasons. A worker terminates unexpectedly, Worker instances are added or removed, Shards are merged or split, application is deployed. In any of the situations the best way to prevent duplicates is to have a unique identifier in data. Ans should be A, C

upvoted 15 times

...

Saintu

Most Recent 1 year, 9 months ago

Correct ans is BC, A network-related timeout for the producer could potentially lead to failed record ingestion, but it is less likely to cause duplicated data.

upvoted 1 times

...

pk349

2 years, 1 month ago

AC: I passed the test

upvoted 1 times

...

Ody__

2 years, 5 months ago

Selected Answer: AC

A&C are the correct answers. https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html

upvoted 2 times

...

cloudlearnerhere

2 years, 7 months ago

Correct answers are A & C as there is a chance of duplicates in case a producer retries cause of network connectivity issues or consumer retries when a shard is merged or split. https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html There are two primary reasons why records may be delivered more than one time to your Amazon Kinesis Data Streams application: producer retries and consumer retries. Your application must anticipate and appropriately handle processing individual records multiple times.

upvoted 2 times

cloudlearnerhere

2 years, 7 months ago

Consider a producer that experiences a network-related timeout after it makes a call to PutRecord, but before it can receive an acknowledgement from Amazon Kinesis Data Streams. The producer cannot be sure if the record was delivered to Kinesis Data Streams. Assuming that every record is important to the application, the producer would have been written to retry the call with the same data. If both PutRecord calls on that same data were successfully committed to Kinesis Data Streams, then there will be two Kinesis Data Streams records. Although the two records have identical data, they also have unique sequence numbers. Applications that need strict guarantees should embed a primary key within the record to remove duplicates later when processing. Note that the number of duplicates due to producer retries is usually low compared to the number of duplicates due to consumer retries. Consumer (data processing application) retries happen when record processors restart.

upvoted 2 times

cloudlearnerhere

2 years, 7 months ago

Record processors for the same shard restart in the following cases: A worker terminates unexpectedly Worker instances are added or removed Shards are merged or split The application is deployed

upvoted 2 times

...