Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 129 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 129
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A company ingests machine learning (ML) data from web advertising clicks into an Amazon S3 data lake. Click data is added to an Amazon Kinesis data stream by using the Kinesis Producer Library (KPL). The data is loaded into the S3 data lake from the data stream by using an Amazon Kinesis Data Firehose delivery stream. As the data volume increases, an ML specialist notices that the rate of data ingested into Amazon S3 is relatively constant. There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest.
Which next step is MOST likely to improve the data ingestion rate into Amazon S3?

A. Increase the number of S3 prefixes for the delivery stream to write to.
B. Decrease the retention period for the data stream.
C. Increase the number of shards for the data stream.
D. Add more consumers using the Kinesis Client Library (KCL).

Show Suggested Answer

Suggested Answer: C 🗳️

by SophieSu at Feb. 25, 2021, 8:40 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

SophieSu

Highly Voted 3 years, 3 months ago

C is the correct answer. # of shard is determined by: 1. # of transactions per second times 2. data blob eg. 100 KB in size 3. One shard can Ingest 1 MB/second

upvoted 38 times

...

dolorez

Highly Voted 2 years, 7 months ago

the answer should be A - the reason why shards are not the right answer is the lack of ProvisionedThroughputExceeded exceptions that occur when a KDS has too few shards. The scenario talks about a consistent pace of delivery into S3 and a rising backlog of data (which indicates KDS stream is still able to ingest data) in the stream, hence the S3 write limit per prefix is at fault: https://www.amazonaws.cn/en/kinesis/data-streams/faqs/#:~:text=Q%3A%20What%20happens%20if%20the%20capacity%20limits%20of%20a%20Kinesis%20stream%20are%20exceeded%20while%20the%20data%20producer%20adds%20data%20to%20the%20stream%3F https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

upvoted 13 times

u_b

1 year, 1 month ago

from https://aws.amazon.com/kinesis/data-firehose/faqs/?nc1=h_ls Q: How often does Kinesis Data Firehose read data from my Kinesis stream? A: Kinesis Data Firehose calls Kinesis Data Streams GetRecords() once every second for each Kinesis shard. // and the number of records per GetRecords() is at most 10.000 => having n shards you will get at most 10.000n records to firehose per sec. => hence firehose instead of s3 could be the limiting factor. => i'd also go with inc shards as the first choice (to not having to change the S3 consumers)

upvoted 1 times

...

rav009

Most Recent 7 months, 3 weeks ago

Selected Answer: A

shards is a concept in kinesis data stream. But here the topic mention "There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest" So even firehose has large backlogs, which means the limit comes from the S3. So A.

upvoted 1 times

...

pupsik

10 months, 3 weeks ago

Selected Answer: A

The bottle neck is not at data ingestion (i.e. Kinesis shards), but in write to S3, which throughput is bound by prefixes used.

upvoted 2 times

...

wendaz

1 year, 2 months ago

A is not solving the issue, the bottleneck locate not in S3 but in the KDS, so we should solve the problem at the KDS, the Shards

upvoted 1 times

...

loict

1 year, 3 months ago

I think the question is very ambiguous. "There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest.", that suggest the backlog is on the client-side (even before reaching KDS). Any component down the chain can be a bottlneck (KDS shrad, Firehose, S3). There is just no way to know in my opinion, but increasing shard is certainly the easiest to try without impact the storage structure in S3 and possibly breaking the app.

upvoted 4 times

...

teka112233

1 year, 4 months ago

Selected Answer: C

this is my key word to solve this problem : There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest. so increasing the shards to ingest is the solution

upvoted 1 times

...

Mickey321

1 year, 4 months ago

Selected Answer: C

no of shards

upvoted 1 times

...

daidaidai

1 year, 8 months ago

Selected Answer: C

A is not correct, because "There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest", the backlog is totally not caused by S3 performance, but the shard issue.

upvoted 2 times

...

Mllb

1 year, 9 months ago

Selected Answer: C

To increase ingest

upvoted 3 times

...

AjoseO

1 year, 9 months ago

Selected Answer: C

The increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose indicates that the ingestion rate is slower than the data production rate. Therefore, the next step to improve the data ingestion rate into Amazon S3 is to increase the capacity of Kinesis Data Streams by increasing the number of shards. This will increase the parallelism of data processing, allowing for a higher throughput rate. Option C is the correct answer. Option A is incorrect because increasing the number of S3 prefixes for the delivery stream will not directly affect the ingestion rate into S3.

upvoted 4 times

...

Aninina

2 years ago

Selected Answer: C

To improve the data ingestion rate into Amazon S3, the ML specialist should consider increasing the number of shards for the Kinesis data stream. A Kinesis data stream is made up of one or more shards, and each shard provides a fixed amount of capacity for ingesting and storing data. By increasing the number of shards, the specialist can increase the overall capacity of the data stream and improve the rate at which data is ingested.

upvoted 3 times

...

GauravLahotiML

2 years, 1 month ago

Selected Answer: C

C is the correct answer

upvoted 3 times

...

aScientist

2 years, 2 months ago

Selected Answer: C

Clearly S3 is a bottleneck. S3 has parallel perfromance acrtoss prefixes, thus increasing throughput

upvoted 3 times

...

niopio

2 years, 3 months ago

Selected Answer: A

It seems S3 is the bottlneck. Adding more prefixes will help: https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

upvoted 6 times

...

Shailendraa

2 years, 4 months ago

12-sep exam

upvoted 1 times

...

V_B_

2 years, 5 months ago

The question seems to indicate the problem in the ability of S3 to load the data. Therefore, I think the answer is A. https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html

upvoted 2 times

...

Load full discussion...