Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 92 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 92
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

An IoT company wants to release a new device that will collect data to track sleep overnight on an intelligent mattress. Sensors will send data that will be uploaded to an Amazon S3 bucket. About 2 MB of data is generated each night for each bed. Data must be processed and summarized for each user, and the results need to be available as soon as possible. Part of the process consists of time windowing and other functions. Based on tests with a Python script, every run will require about 1 GB of memory and will complete within a couple of minutes.
Which solution will run the script in the MOST cost-effective way?

A. AWS Lambda with a Python script
B. AWS Glue with a Scala job
C. Amazon EMR with an Apache Spark script
D. AWS Glue with a PySpark job

Show Suggested Answer

Suggested Answer: A 🗳️

by VikG12 at May 3, 2021, 6:05 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

adamstaros

Highly Voted 3 years, 6 months ago

I think that answer should be "A". Lambda is the most cost effective solution and satisfies both memory and time requirements. Additionally lambda support "tumbling windows" https://aws.amazon.com/blogs/compute/using-aws-lambda-for-streaming-analytics/ so in my opinion "A" is the best option in this question.

upvoted 23 times

Booqq

2 years, 11 months ago

D Because: Tumbling windows are distinct time windows that open and close at regular intervals. By default, Lambda invocations are stateless—you cannot use them for processing data across multiple continuous invocations without an external database. However, with tumbling windows, you can maintain your state across invocations. This state contains the aggregate result of the messages previously processed for the current window. Your state can be a maximum of 1 MB per shard. If it exceeds that size, Lambda terminates the window early. https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html#services-kinesis-windows

upvoted 3 times

...

nadavw

2 years, 4 months ago

Tumbling is for streaming analytics, and here it's a batch as data is in S3 source.

upvoted 2 times

...

soni12390

Highly Voted 3 years, 7 months ago

Glue DataBrew supports window functions https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.functions.window.html will go with D

upvoted 22 times

...

MLCL

Most Recent 1 year, 10 months ago

Selected Answer: A

A Python script is already provided, and rewriting it in Scala or Spark would be unnecessary work. Time windowing here is not referencing streaming, the files are already inside S3.

upvoted 2 times

...

whenthan

1 year, 10 months ago

A - lambda invocation tumbling window functions

upvoted 1 times

...

MeshterZYX

2 years, 2 months ago

A, not D. Confucius says "Do not use a cannon to kill a mosquito."

upvoted 4 times

...

AwsNewPeople

2 years, 2 months ago

Selected Answer: A

In normal answer sure I will go for D, but most cost effective and only require 1GB of memory, I will go for A AWS Lambda is a serverless compute service that runs code in response to events and automatically scales based on the incoming traffic. With Lambda, the user only pays for the compute time that the function uses, making it a cost-effective option. Since the Python script has been tested to run within a few minutes with 1 GB of memory, AWS Lambda can easily handle the processing requirements for this project. In addition, since the data generated each night is relatively small (2 MB per bed), AWS Lambda's maximum payload size of 512 MB is more than enough to handle the incoming data. The processed data can also be easily uploaded to the Amazon S3 bucket. Therefore, AWS Lambda with a Python script would be the most cost-effective solution for this project, as it provides a serverless and scalable environment for running the Python script with the required memory and processing capabilities.

upvoted 4 times

...

akashm99101001com

2 years, 2 months ago

Selected Answer: A

MOST cost-effective way is Lambda

upvoted 1 times

...

asyouwish

2 years, 4 months ago

Answer A was impossible at the time exam DAS-C01 came out (13 APR 2020.) The AWS blog post announcing the new windowing feature for Lambda is dated 15 DEC 2020.

upvoted 2 times

...

Erso

2 years, 4 months ago

Selected Answer: D

in my opinion is D. time windowing , 1 GB of memory and a couple of minutes of execution...Glue is better

upvoted 2 times

...

Arjun777

2 years, 4 months ago

Lamda to handle tumbling window - data should be limited to 1MB These include: Window start and end: the beginning and ending timestamps for the current tumbling window. State: an object containing the state returned from the previous window, which is initially empty. The state object can contain up to 1 MB of data. isFinalInvokeForWindow: indicates if this is the last invocation for the tumbling window. This only occurs once per window period. isWindowTerminatedEarly: a window ends early only if the state exceeds the maximum allowed size of 1 MB. Therefore its D - As glue job with Pyspark can handle this volume of data aggregation by each bed.

upvoted 1 times

...

Chelseajcole

2 years, 4 months ago

Maybe the question is testing do you know PySpark can do windowing function. PySpark is also Python script. In term of cost effective, Glue is cheaper cpmpare to EMR. So overall, I vote for D.

upvoted 1 times

...

silvaa360

2 years, 5 months ago

Selected Answer: A

The process is already tested with python, so there is not a concern on the fact that we might need a PySpark job to work with dataframes, etc. Also it seems a bit overkill to set up a Spark job to process 2mb files. Again, if the quote saying that "the script is tested with python" was not present, I would choose PySpark or Scala. The Python mention might also be for us to choose PySpark over Scala, but it is not a easy question and I think both answers are quite right. I think it must be A

upvoted 3 times

...

thuyeinaung

2 years, 6 months ago

Selected Answer: A

A is way cost effective.

upvoted 2 times

...

cloudlearnerhere

2 years, 7 months ago

Go for A due to cost effectiveness as Glue is much expensive as compare to Lambda

upvoted 3 times

...

Hussben

2 years, 7 months ago

Selected Answer: A

The data size for every bed is 2 MB. In this case, Lambda should faster than Glub job

upvoted 1 times

...

Bansel

2 years, 7 months ago

A: This new feature introduces the concept of a tumbling window, which is a fixed-size, non-overlapping time interval of up to 15 minutes. To use this, you specify a tumbling window duration in the event-source mapping between the stream and the Lambda function. When you apply a tumbling window to a stream, items in the stream are grouped by window and sent to the processing Lambda function. The function returns a state value that is passed to the next invocation of the tumbling window. https://aws.amazon.com/blogs/compute/using-aws-lambda-for-streaming-analytics/

upvoted 1 times

...

rav009

2 years, 7 months ago

Selected Answer: D

D over A for tumbling window

upvoted 1 times

...

Load full discussion...