Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 44 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 44
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A company that monitors weather conditions from remote construction sites is setting up a solution to collect temperature data from the following two weather stations.
✑ Station A, which has 10 sensors
✑ Station B, which has five sensors
These weather stations were placed by onsite subject-matter experts.
Each sensor has a unique ID. The data collected from each sensor will be collected using Amazon Kinesis Data Streams.
Based on the total incoming and outgoing data throughput, a single Amazon Kinesis data stream with two shards is created. Two partition keys are created based on the station names. During testing, there is a bottleneck on data coming from Station A, but not from Station B. Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.
How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?

A. Increase the number of shards in Kinesis Data Streams to increase the level of parallelism.
B. Create a separate Kinesis data stream for Station A with two shards, and stream Station A sensor data to the new stream.
C. Modify the partition key to use the sensor ID instead of the station name.
D. Reduce the number of sensors in Station A from 10 to 5 sensors.

Show Suggested Answer

Suggested Answer: C 🗳️

by Priyanka_01 at Aug. 15, 2020, 11:31 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Priyanka_01

Highly Voted 3 years, 8 months ago

C? A and B increase the cost

upvoted 34 times

awssp12345

3 years, 8 months ago

Agreed

upvoted 2 times

...

lakediver

3 years, 5 months ago

Agreed For further reading see - https://aws.amazon.com/blogs/big-data/under-the-hood-scaling-your-kinesis-data-streams/

upvoted 2 times

...

sanjaym

Highly Voted 3 years, 7 months ago

C is 100% correct answer.

upvoted 11 times

...

pk349

Most Recent 2 years, 1 month ago

C: I passed the test

upvoted 2 times

...

anjuvinayan

2 years, 1 month ago

Answer is C A. No need to Increase the number of shards as in question its mentioned the throughput is less B. More cost C. Modifying the partition key to use the sensor ID instead of the station name is the correct answer. As of now all data from Station A which has more sensors is going to one shard and all data from Station B to another shard which has less sensors. By changing partition key to sensor ID will help to divide the data base on sensors to shard. D. Change in infra which is not required

upvoted 4 times

...

rags140882

2 years, 3 months ago

Option B does involve creating a separate Kinesis data stream for Station A, which could be seen as increasing the complexity of the solution compared to modifying the partition key. However, in this scenario, the bottleneck is on data coming from Station A, and creating a separate stream with dedicated shards for that station can help to increase parallelism and improve throughput without increasing the overall cost of the solution. On the other hand, modifying the partition key to use the sensor ID instead of the station name could result in uneven shard distribution and hot partitions if the distribution of sensors across stations is uneven. This could lead to degraded performance and require additional scaling in the future, which could increase complexity and cost over time. So, while both options have their pros and cons, creating a separate Kinesis data stream for Station A with dedicated shards can be a more effective and scalable solution for improving throughput in this scenario.

upvoted 1 times

...

cloudlearnerhere

2 years, 7 months ago

Selected Answer: C

Correct answer is C as currently the partition keys are based on station names and with two shards, Station A shard is overloaded with 10 sensors, and Station B shard with 5 sensors. Changing the partition key from station names to sensor id would distribute the data equally across shards without increasing the overall cost and complexity of the solution. Option A is wrong as increasing shards would increase the cost. Option B is wrong as adding Kinesis Data Stream would increase the cost. Option D is wrong as reducing the number of sensors would reduce the data collection quality.

upvoted 5 times

cloudlearnerhere

2 years, 7 months ago

The partition key determines to which shard the record is written. The partition key is a Unicode string with a maximum length of 256 bytes. Kinesis runs the partition key value that you provide in the request through an MD5 hash function. The resulting value maps your record to a specific shard within the stream, and Kinesis writes the record to that shard. Partition keys dictate how to distribute data across the stream and use shards. Certain use cases require you to partition data based on specific criteria for efficient processing by the consuming applications. As an example, if you use player ID pk1234 as the hash key, all scores related to that player route to shard1. The consuming application can use the fact that data stored in shard1 has an affinity with the player ID and can efficiently calculate the leaderboard. An increase in traffic related to players mapped to shard1 can lead to a hot shard. Kinesis Data Streams allows you to handle such scenarios by splitting or merging shards without disrupting your streaming pipeline.

upvoted 2 times

cloudlearnerhere

2 years, 7 months ago

If your use cases do not require data stored in a shard to have high affinity, you can achieve high overall throughput by using a random partition key to distribute data. Random partition keys help distribute the incoming data records evenly across all the shards in the stream and reduce the likelihood of one or more shards getting hit with a disproportionate number of records. You can use a universally unique identifier (UUID) as a partition key to achieve this uniform distribution of records across shards. This strategy can increase the latency of record processing if the consumer application has to aggregate data from multiple shards.

upvoted 2 times

...

thirukudil

2 years, 7 months ago

Selected Answer: C

Ans is C. A and B will increase the overall cost. D - reducing the sensors is not the good option. C - by modifying the partition key to sensor id , input data will be evenly distributed across both the shards by avoiding the hot-sharding in the first shard

upvoted 1 times

...

Arka_01

2 years, 8 months ago

Selected Answer: C

It gives you the answer here - 1. Station A is facing problem. This has 10 sensor ID and obviously more data. 2. total stream throughput is still less than the allocated Kinesis Data Streams throughput. So we are not utilizing full stream's capability. So workloads are not evenly distributed amongst Shards.

upvoted 2 times

...

rocky48

2 years, 11 months ago

Selected Answer: C

Answer-C

upvoted 1 times

...

certificationJunkie

3 years ago

C is correct answer. Increasing shards won't help there as partitioning is based on station name and there are only two stations.

upvoted 1 times

...

Bik000

3 years ago

Selected Answer: C

Answer is C

upvoted 2 times

...

jrheen

3 years, 1 month ago

Answer-C

upvoted 1 times

...

Teraxs

3 years, 1 month ago

Selected Answer: C

C- sensor id as partition key allows equal distribution of data between the two shards

upvoted 1 times

...

aws2019

3 years, 6 months ago

C is the right answer

upvoted 1 times

...

lostsoul07

3 years, 7 months ago

C is the right answer

upvoted 4 times

...

BillyC

3 years, 7 months ago

C is correct!

upvoted 4 times

...

syu31svc

3 years, 7 months ago

D is obviously wrong From link: https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html "Splitting increases the number of shards in your stream and therefore increases the data capacity of the stream. Because you are charged on a per-shard basis, splitting increases the cost of your stream" So answer is C

upvoted 1 times

...

Load full discussion...