exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 61 discussion

A media analytics company consumes a stream of social media posts. The posts are sent to an Amazon Kinesis data stream partitioned on user_id. An AWS
Lambda function retrieves the records and validates the content before loading the posts into an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. The validation process needs to receive the posts for a given user in the order they were received by the Kinesis data stream.
During peak hours, the social media posts take more than an hour to appear in the Amazon OpenSearch Service (Amazon ES) cluster. A data analytics specialist must implement a solution that reduces this latency with the least possible operational overhead.
Which solution meets these requirements?

  • A. Migrate the validation process from Lambda to AWS Glue.
  • B. Migrate the Lambda consumers from standard data stream iterators to an HTTP/2 stream consumer.
  • C. Increase the number of shards in the Kinesis data stream.
  • D. Send the posts stream to Amazon Managed Streaming for Apache Kafka instead of the Kinesis data stream.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Alekx42
Highly Voted 2 years, 11 months ago
Selected Answer: C
Increasing the number of shards seems to be a good idea since Lambda can process 1 batch of data from each Kinesis shard with 1 lambda invocation. This means that if you have 100 shards you can have 100 concurrent lambda invocations. If you increase the number of shards you can increase the parallelism and you could be quicker to process the data. This is assuming that the Lambda ParallelizationFactor is set to 1. Switching to AWS Glue could increase the speed of the data processing (since Glue can use Spark, which can be way faster than a Lambda function when processing a lot of data) but this would increase the operational overhead.
upvoted 12 times
...
god_father
Most Recent 1 year, 4 months ago
For those wondering why not go for 'A', and go for 'C' instead: in glue worker types such as G.1X, G.2X, etc must be selected which increases overhead. Hence, the one with least overhead is 'C' by using the concept of parallelism.
upvoted 1 times
...
GCPereira
1 year, 5 months ago
A: is wrong because glue has concurrency limit and spark is poor option to small files; B: if lambda is a unique consumer, dont have necessity to upgrade these for enhanced-fan-out; C: this is a tipically bottleneck problem... to solve this just insert more shards; D: is wrong because switch SaaS have A LOT OF operational overhead
upvoted 2 times
...
juanife
1 year, 10 months ago
I want to contribute in this Question and telling you why B isn't correct and C yes. Option B is useless because the question never tell that there is another consumer, so lambda is leveraging the shard throughput (there is no need to set enhanced fan-out consumer). Incrementing shard will work, because in AWS there's 1 lambda function invocation per kinesis shard. At the same time, per shard you can increase lambda functions concurrency with the ParallelizationFactor set to a name between 1 (default value) to 10.
upvoted 2 times
...
MLCL
1 year, 10 months ago
Could be C or B depending on multiple factors. To increase the prformance of KDS you have 3 options : - More shards. - Parallelization factors (specific to Lambda) - HTTP/2 - Enhanced Fan-out.
upvoted 2 times
...
Debuggerrr
1 year, 11 months ago
B seems to be correct. As ordering has to be maintained, Lambda function will be handy only if the consumer is KCL because it has that inbuilt sorting logic for parent and child shard.
upvoted 3 times
...
Debi_mishra
2 years ago
C can never be right - increasing shards cannot assure ordering and thats the catch here. B seems close.
upvoted 1 times
MLCL
1 year, 10 months ago
If you partition by user_id, it does guarantee order, since only the same user_id go to the same shard.
upvoted 3 times
...
MLCL
1 year, 10 months ago
The stream is partitioned by user_id, increasing the number of shards won't impact the ordering of records for a specific user because all posts from a particular user would go to the same shard.
upvoted 3 times
...
...
pk349
2 years, 1 month ago
C: I passed the test
upvoted 2 times
...
rags1482
2 years, 2 months ago
Answer B based on below link https://aws.amazon.com/about-aws/whats-new/2018/11/aws-lambda-supports-kinesis-data-streams-enhanced-fan-out-and-http2/
upvoted 2 times
...
Arjun777
2 years, 4 months ago
option B- Migrating the Lambda consumers to an HTTP/2 stream consumer can significantly reduce processing latency and improve the overall performance of the system. This is because HTTP/2 stream consumers allow Lambda to retrieve records from the stream more efficiently, which can help to reduce processing latency and improve the overall performance of the system. Migrating the Lambda consumers to an HTTP/2 stream consumer can significantly reduce processing latency and improve the overall performance of the system. This is because HTTP/2 stream consumers allow Lambda to retrieve records from the stream more efficiently, which can help to reduce processing latency and improve the overall performance of the system. Migrating to an HTTP/2 stream consumer requires minimal operational overhead, as it only involves updating the Lambda function code to use the new consumer type. This can be done easily using the AWS SDK for Lambda, and does not require any major changes to the existing architecture. Therefore, option B is the best solution for reducing the latency with the least possible operational overhead.
upvoted 3 times
aws_kid
2 years, 2 months ago
Increasing shards is easier than enhanced fan-out and cheaper too.
upvoted 2 times
...
...
nadavw
2 years, 5 months ago
Selected Answer: B
C is a temporary solution, as there is no idea of the expected number of shards you need to increase. There is no simple auto-scaling in Kinesis, so there will be an operational overhead to continuously monitor the system and increase the number of shards. In addition, the partitioning-sharding is according to user_id - how this can be solved? B - the enhanced fanout approach is good for it, as described here: "The enhanced capacity enables you to achieve higher outbound throughput without provisioning more streams or shards in the same stream." https://aws.amazon.com/blogs/compute/increasing-real-time-stream-processing-performance-with-amazon-kinesis-data-streams-enhanced-fan-out-and-aws-lambda/
upvoted 1 times
...
Arka_01
2 years, 8 months ago
Selected Answer: C
"least possible operational overhead" - This is the key here. As the solution demands to reduce latency, this will be the easiest way to do so. Notice, that cost factor is not mentioned in the question.
upvoted 1 times
...
rocky48
2 years, 11 months ago
Selected Answer: C
Increasing the number of shards looks ok.
upvoted 1 times
...
Sen5476
2 years, 11 months ago
I go with B for these two reasons. 1. Messages should be received in same order for user. Scaling out the shards during peak hours and scaling in after peak hours may change the message order. C is not the correct one. 2. HTTP/2 is enhanced fan out consumer which will reduce the latency from 200ms to 70ms. 65% latency reduction
upvoted 4 times
...
f4bi4n
3 years, 1 month ago
Selected Answer: C
C, but you must ensure that you use Partition Keys (In this case the User) to ensure the requested ordering per User. HTTP/2 would also decrease latency but needs more effort
upvoted 2 times
...
jrheen
3 years, 1 month ago
C - Increase Shards
upvoted 2 times
...
CHRIS12722222
3 years, 1 month ago
I think B standard consumer latency = 200ms Http/2 latency = 70ms
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...