exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 129 discussion

A company ingests machine learning (ML) data from web advertising clicks into an Amazon S3 data lake. Click data is added to an Amazon Kinesis data stream by using the Kinesis Producer Library (KPL). The data is loaded into the S3 data lake from the data stream by using an Amazon Kinesis Data Firehose delivery stream. As the data volume increases, an ML specialist notices that the rate of data ingested into Amazon S3 is relatively constant. There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest.
Which next step is MOST likely to improve the data ingestion rate into Amazon S3?

  • A. Increase the number of S3 prefixes for the delivery stream to write to.
  • B. Decrease the retention period for the data stream.
  • C. Increase the number of shards for the data stream.
  • D. Add more consumers using the Kinesis Client Library (KCL).
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
SophieSu
Highly Voted 3 years, 1 month ago
C is the correct answer. # of shard is determined by: 1. # of transactions per second times 2. data blob eg. 100 KB in size 3. One shard can Ingest 1 MB/second
upvoted 38 times
...
dolorez
Highly Voted 2 years, 5 months ago
the answer should be A - the reason why shards are not the right answer is the lack of ProvisionedThroughputExceeded exceptions that occur when a KDS has too few shards. The scenario talks about a consistent pace of delivery into S3 and a rising backlog of data (which indicates KDS stream is still able to ingest data) in the stream, hence the S3 write limit per prefix is at fault: https://www.amazonaws.cn/en/kinesis/data-streams/faqs/#:~:text=Q%3A%20What%20happens%20if%20the%20capacity%20limits%20of%20a%20Kinesis%20stream%20are%20exceeded%20while%20the%20data%20producer%20adds%20data%20to%20the%20stream%3F https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
upvoted 13 times
u_b
11 months, 3 weeks ago
from https://aws.amazon.com/kinesis/data-firehose/faqs/?nc1=h_ls Q: How often does Kinesis Data Firehose read data from my Kinesis stream? A: Kinesis Data Firehose calls Kinesis Data Streams GetRecords() once every second for each Kinesis shard. // and the number of records per GetRecords() is at most 10.000 => having n shards you will get at most 10.000n records to firehose per sec. => hence firehose instead of s3 could be the limiting factor. => i'd also go with inc shards as the first choice (to not having to change the S3 consumers)
upvoted 1 times
...
...
rav009
Most Recent 5 months, 2 weeks ago
Selected Answer: A
shards is a concept in kinesis data stream. But here the topic mention "There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest" So even firehose has large backlogs, which means the limit comes from the S3. So A.
upvoted 1 times
...
pupsik
8 months, 2 weeks ago
Selected Answer: A
The bottle neck is not at data ingestion (i.e. Kinesis shards), but in write to S3, which throughput is bound by prefixes used.
upvoted 2 times
...
wendaz
1 year ago
A is not solving the issue, the bottleneck locate not in S3 but in the KDS, so we should solve the problem at the KDS, the Shards
upvoted 1 times
...
loict
1 year, 1 month ago
I think the question is very ambiguous. "There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest.", that suggest the backlog is on the client-side (even before reaching KDS). Any component down the chain can be a bottlneck (KDS shrad, Firehose, S3). There is just no way to know in my opinion, but increasing shard is certainly the easiest to try without impact the storage structure in S3 and possibly breaking the app.
upvoted 4 times
...
teka112233
1 year, 2 months ago
Selected Answer: C
this is my key word to solve this problem : There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest. so increasing the shards to ingest is the solution
upvoted 1 times
...
Mickey321
1 year, 2 months ago
Selected Answer: C
no of shards
upvoted 1 times
...
daidaidai
1 year, 5 months ago
Selected Answer: C
A is not correct, because "There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest", the backlog is totally not caused by S3 performance, but the shard issue.
upvoted 2 times
...
Mllb
1 year, 7 months ago
Selected Answer: C
To increase ingest
upvoted 3 times
...
AjoseO
1 year, 7 months ago
Selected Answer: C
The increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose indicates that the ingestion rate is slower than the data production rate. Therefore, the next step to improve the data ingestion rate into Amazon S3 is to increase the capacity of Kinesis Data Streams by increasing the number of shards. This will increase the parallelism of data processing, allowing for a higher throughput rate. Option C is the correct answer. Option A is incorrect because increasing the number of S3 prefixes for the delivery stream will not directly affect the ingestion rate into S3.
upvoted 4 times
...
Aninina
1 year, 10 months ago
Selected Answer: C
To improve the data ingestion rate into Amazon S3, the ML specialist should consider increasing the number of shards for the Kinesis data stream. A Kinesis data stream is made up of one or more shards, and each shard provides a fixed amount of capacity for ingesting and storing data. By increasing the number of shards, the specialist can increase the overall capacity of the data stream and improve the rate at which data is ingested.
upvoted 3 times
...
GauravLahotiML
1 year, 11 months ago
Selected Answer: C
C is the correct answer
upvoted 3 times
...
aScientist
1 year, 12 months ago
Selected Answer: C
Clearly S3 is a bottleneck. S3 has parallel perfromance acrtoss prefixes, thus increasing throughput
upvoted 3 times
...
niopio
2 years, 1 month ago
Selected Answer: A
It seems S3 is the bottlneck. Adding more prefixes will help: https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
upvoted 6 times
...
Shailendraa
2 years, 1 month ago
12-sep exam
upvoted 1 times
...
V_B_
2 years, 2 months ago
The question seems to indicate the problem in the ability of S3 to load the data. Therefore, I think the answer is A. https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago