Exam AWS Certified Big Data - Specialty All Questions

View all questions & answers for the AWS Certified Big Data - Specialty exam

Exam AWS Certified Big Data - Specialty topic 2 question 9 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty

Question #: 9
Topic #: 2

[All AWS Certified Big Data - Specialty Questions]

An organization has added a clickstream to their website to analyze traffic. The website is sending each page request with the PutRecord API call to an Amazon
Kinesis stream by using the page name as the partition key. During peak spikes in website traffic, a support engineer notices many events in the application logs.
ProvisionedThroughputExcededException
What should be done to resolve the issue in the MOST cost-effective way?

A. Create multiple Amazon Kinesis streams for page requests to increase the concurrency of the clickstream.
B. Increase the number of shards on the Kinesis stream to allow for more throughput to meet the peak spikes in traffic.
C. Modify the application to use on the Kinesis Producer Library to aggregate requests before sending them to the Kinesis stream.
D. Attach more consumers to the Kinesis stream to process records in parallel, improving the performance on the stream. B

Show Suggested Answer

Suggested Answer: Explanation 🗳️
Reference:
https://aws.amazon.com/kinesis/data-streams/faqs/

by jlpl at Aug. 29, 2019, 4:42 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

yuriy_ber

Highly Voted 3 years, 9 months ago

I'm also stumbled upon this questions - it looks very obvious that they are not aggregating because they use PutRecord API. Furthermore they have only peak spikes, so if they increase number of shards they would have constantly higher costs for only occasional spikes. It's definitely C, additionally it would also be possible to implement compression using KPL.

upvoted 8 times

...

VB

Highly Voted 3 years, 9 months ago

But is B a most cost-effective way?.. the price we pay depends on the shards .. can it be C?

upvoted 6 times

...

DerekKey

Most Recent 3 years, 8 months ago

C - this is how you avoid ProvisionedThroughputExcededException

upvoted 1 times

...

matthew95

3 years, 8 months ago

It should be C, because batching increase throughput and decrease cost

upvoted 2 times

...

k115

3 years, 8 months ago

C is the right answer

upvoted 2 times

...

winset

3 years, 8 months ago

B or C?

upvoted 1 times

...

emailtorajivk

3 years, 8 months ago

The request rate for the stream is too high, or the requested data is too large for the available throughput. Reduce the frequency or size of your requests. For more information, https://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecords.html So using the producer library will decrease the frequency

upvoted 1 times

...

Bulti

3 years, 8 months ago

Answer is C: Batching (both turned on by default) increase throughput, decrease cost: •Collect Records and Write to multiple shards in the same PutRecords API call •Aggregate increased latency •Capability to store multiple records in one record (go over 1000 records per second limit) •Increase payload size and improve throughput (maximize 1MB/s limit)

upvoted 4 times

...

susan8840

3 years, 8 months ago

agreed B. input data increased so to increase shards. Your data blob, partition key, and data stream name are required parameters of a PutRecord or PutRecords call. The size of your data blob (before Base64 encoding) and partition key will be counted against the data throughput of your Amazon Kinesis data stream, which is determined by the number of shards within the data stream.

upvoted 1 times

...

san2020

3 years, 8 months ago

my selection C

upvoted 6 times

...

aewis

3 years, 8 months ago

C ! https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html

upvoted 1 times

...

richardxyz

3 years, 9 months ago

C is correct; KPL supports Aggregation, storing multiple records within a single Kinesis Data Streams record and with this feature, we can go beyond 1000 records per second per shard.

upvoted 3 times

...

kalpanareddy

3 years, 9 months ago

I will go with B https://aws.amazon.com/kinesis/data-streams/faqs/

upvoted 1 times

RamNelluru

3 years, 9 months ago

B may not work because partition key is page name. Even if you increase the number of shards still there may be a hot partition because of single page sending more puts. Since this information is not provided, C may be right answer.

upvoted 3 times

...

am7

3 years, 9 months ago

Due to frequent checkpointing its giving the errors. When the data will be aggregated the checkpointing will get reduced and in turn solve the problem in the most effective way.

upvoted 1 times

...

s3an

3 years, 9 months ago

C I think. https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-kpl.html "The KPL is an easy-to-use, highly configurable library that helps you write to a Kinesis data stream. It acts as an intermediary between your producer application code and the Kinesis Data Streams API actions. The KPL performs the following primary tasks: Writes to one or more Kinesis data streams with an automatic and configurable retry mechanism Collects records and uses PutRecords to write multiple records to multiple shards per request Aggregates user records to increase payload size and improve throughput". Please, anyone on this thread who passed this exam already? That way we can all be on same page with such, to know which answers are correct

upvoted 2 times

...

Zire

3 years, 9 months ago

My choice is C. While B is a way to resolve the throughput issue, aggregation proposed by C would be more cost-effective.

upvoted 3 times

cybe001

3 years, 9 months ago

You also get provisionedthroughputexceeded for volume of data also. So aggregation won't solve the issue. Answer is B

upvoted 4 times

d00ku

3 years, 9 months ago

Aggregation solves volume issues, Batching solves throughput issues.. seems C.

upvoted 3 times

...

jlpl

3 years, 9 months ago

B ? Thoughts

upvoted 2 times

mattyb123

3 years, 9 months ago

Correct. https://aws.amazon.com/kinesis/data-streams/faqs/ Q: What happens if the capacity limits of an Amazon Kinesis data stream are exceeded while the data producer adds data to the data stream? The capacity limits of an Amazon Kinesis data stream are defined by the number of shards within the data stream. The limits can be exceeded by either data throughput or the number of PUT records. While the capacity limits are exceeded, the put data call will be rejected with a ProvisionedThroughputExceeded exception. If this is due to a temporary rise of the data stream’s input data rate, retry by the data producer will eventually lead to completion of the requests. If this is due to a sustained rise of the data stream’s input data rate, you should increase the number of shards within your data stream to provide enough capacity for the put data calls to consistently succeed. In both cases, Amazon CloudWatch metrics allow you to learn about the change of the data stream’s input data rate and the occurrence of ProvisionedThroughputExceeded exceptions.

upvoted 5 times

ME2000

3 years, 9 months ago

B is the answer More on "ProvisionedThroughputExceededException"... https://docs.aws.amazon.com/streams/latest/dev/troubleshooting-consumers.html https://docs.aws.amazon.com/streams/latest/dev/kinesis-low-latency.html https://any-api.com/amazonaws_com/kinesis/docs/Definitions/ProvisionedThroughputExceededException

upvoted 1 times

Corram

3 years, 8 months ago

C is correct. ProvisionedThroughputExceededException can be caused either by too large data volume or too many requests. Since PutRecord API is used, each record gets sent on its own, making too many requests highly probable. Thus, C should help and it is obviously more cost effective than B.

upvoted 5 times

...