exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 85 discussion

An online retailer is rebuilding its inventory management system and inventory reordering system to automatically reorder products by using Amazon Kinesis Data
Streams. The inventory management system uses the Kinesis Producer Library (KPL) to publish data to a stream. The inventory reordering system uses the
Kinesis Client Library (KCL) to consume data from the stream. The stream has been configured to scale as needed. Just before production deployment, the retailer discovers that the inventory reordering system is receiving duplicated data.
Which factors could be causing the duplicated data? (Choose two.)

  • A. The producer has a network-related timeout.
  • B. The stream's value for the IteratorAgeMilliseconds metric is too high.
  • C. There was a change in the number of shards, record processors, or both.
  • D. The AggregationEnabled configuration property was set to true.
  • E. The max_records configuration property was set to a number that is too high.
Show Suggested Answer Hide Answer
Suggested Answer: AC 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
VikG12
Highly Voted 3 years, 7 months ago
https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html A,C
upvoted 32 times
...
dushmantha
Highly Voted 2 years, 11 months ago
Duplication can happen in two ways, either in producer side or consumer side, obviously. In proudcer side it happens due to network delays/timeouts, specifically producer is waiting for an successful acknowledgement, yet it is lost due to network failure, and producer sends data again until it receives acknowledgement. In consumer side it happens due to record processor restart. This can happen due to 4 reasons. A worker terminates unexpectedly, Worker instances are added or removed, Shards are merged or split, application is deployed. In any of the situations the best way to prevent duplicates is to have a unique identifier in data. Ans should be A, C
upvoted 15 times
...
Saintu
Most Recent 1 year, 9 months ago
Correct ans is BC, A network-related timeout for the producer could potentially lead to failed record ingestion, but it is less likely to cause duplicated data.
upvoted 1 times
...
pk349
2 years, 1 month ago
AC: I passed the test
upvoted 1 times
...
Ody__
2 years, 5 months ago
Selected Answer: AC
A&C are the correct answers. https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html
upvoted 2 times
...
cloudlearnerhere
2 years, 7 months ago
Correct answers are A & C as there is a chance of duplicates in case a producer retries cause of network connectivity issues or consumer retries when a shard is merged or split. https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html There are two primary reasons why records may be delivered more than one time to your Amazon Kinesis Data Streams application: producer retries and consumer retries. Your application must anticipate and appropriately handle processing individual records multiple times.
upvoted 2 times
cloudlearnerhere
2 years, 7 months ago
Consider a producer that experiences a network-related timeout after it makes a call to PutRecord, but before it can receive an acknowledgement from Amazon Kinesis Data Streams. The producer cannot be sure if the record was delivered to Kinesis Data Streams. Assuming that every record is important to the application, the producer would have been written to retry the call with the same data. If both PutRecord calls on that same data were successfully committed to Kinesis Data Streams, then there will be two Kinesis Data Streams records. Although the two records have identical data, they also have unique sequence numbers. Applications that need strict guarantees should embed a primary key within the record to remove duplicates later when processing. Note that the number of duplicates due to producer retries is usually low compared to the number of duplicates due to consumer retries. Consumer (data processing application) retries happen when record processors restart.
upvoted 2 times
cloudlearnerhere
2 years, 7 months ago
Record processors for the same shard restart in the following cases: A worker terminates unexpectedly Worker instances are added or removed Shards are merged or split The application is deployed
upvoted 2 times
...
...
...
Raje14k
2 years, 10 months ago
answer is A - AS Archive objects that are queried by S3 Glacier Select must be in uncompressed comma-separated values (CSV).
upvoted 1 times
...
tweeeeeeety
2 years, 11 months ago
Selected Answer: AC
A&C is the correct answer
upvoted 1 times
...
zinic
2 years, 11 months ago
Selected Answer: AC
AC should be the correct one
upvoted 1 times
...
YahiaAglan74
3 years ago
Selected Answer: AC
AC is the correct answer
upvoted 1 times
...
simo40010
3 years, 3 months ago
Selected Answer: AC
it's made clear in the statement that the problem showed up when the solution was deployed to production, and connection timeouts are another option that can cause that issue
upvoted 2 times
...
t_singh
3 years, 3 months ago
Selected Answer: AC
A and C are correct.
upvoted 1 times
...
RSSRAO
3 years, 3 months ago
A and C Are correct
upvoted 1 times
...
aws2019
3 years, 6 months ago
A and C
upvoted 1 times
...
Monika14Sharma
3 years, 7 months ago
For sure A and C
upvoted 3 times
...
bermo
3 years, 7 months ago
I am ok for A and C
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...