exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 92 discussion

An IoT company wants to release a new device that will collect data to track sleep overnight on an intelligent mattress. Sensors will send data that will be uploaded to an Amazon S3 bucket. About 2 MB of data is generated each night for each bed. Data must be processed and summarized for each user, and the results need to be available as soon as possible. Part of the process consists of time windowing and other functions. Based on tests with a Python script, every run will require about 1 GB of memory and will complete within a couple of minutes.
Which solution will run the script in the MOST cost-effective way?

  • A. AWS Lambda with a Python script
  • B. AWS Glue with a Scala job
  • C. Amazon EMR with an Apache Spark script
  • D. AWS Glue with a PySpark job
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
adamstaros
Highly Voted 3 years, 6 months ago
I think that answer should be "A". Lambda is the most cost effective solution and satisfies both memory and time requirements. Additionally lambda support "tumbling windows" https://aws.amazon.com/blogs/compute/using-aws-lambda-for-streaming-analytics/ so in my opinion "A" is the best option in this question.
upvoted 23 times
Booqq
2 years, 11 months ago
D Because: Tumbling windows are distinct time windows that open and close at regular intervals. By default, Lambda invocations are stateless—you cannot use them for processing data across multiple continuous invocations without an external database. However, with tumbling windows, you can maintain your state across invocations. This state contains the aggregate result of the messages previously processed for the current window. Your state can be a maximum of 1 MB per shard. If it exceeds that size, Lambda terminates the window early. https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html#services-kinesis-windows
upvoted 3 times
...
nadavw
2 years, 4 months ago
Tumbling is for streaming analytics, and here it's a batch as data is in S3 source.
upvoted 2 times
...
...
soni12390
Highly Voted 3 years, 7 months ago
Glue DataBrew supports window functions https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.functions.window.html will go with D
upvoted 22 times
...
MLCL
Most Recent 1 year, 10 months ago
Selected Answer: A
A Python script is already provided, and rewriting it in Scala or Spark would be unnecessary work. Time windowing here is not referencing streaming, the files are already inside S3.
upvoted 2 times
...
whenthan
1 year, 10 months ago
A - lambda invocation tumbling window functions
upvoted 1 times
...
MeshterZYX
2 years, 2 months ago
A, not D. Confucius says "Do not use a cannon to kill a mosquito."
upvoted 4 times
...
AwsNewPeople
2 years, 2 months ago
Selected Answer: A
In normal answer sure I will go for D, but most cost effective and only require 1GB of memory, I will go for A AWS Lambda is a serverless compute service that runs code in response to events and automatically scales based on the incoming traffic. With Lambda, the user only pays for the compute time that the function uses, making it a cost-effective option. Since the Python script has been tested to run within a few minutes with 1 GB of memory, AWS Lambda can easily handle the processing requirements for this project. In addition, since the data generated each night is relatively small (2 MB per bed), AWS Lambda's maximum payload size of 512 MB is more than enough to handle the incoming data. The processed data can also be easily uploaded to the Amazon S3 bucket. Therefore, AWS Lambda with a Python script would be the most cost-effective solution for this project, as it provides a serverless and scalable environment for running the Python script with the required memory and processing capabilities.
upvoted 4 times
...
akashm99101001com
2 years, 2 months ago
Selected Answer: A
MOST cost-effective way is Lambda
upvoted 1 times
...
asyouwish
2 years, 4 months ago
Answer A was impossible at the time exam DAS-C01 came out (13 APR 2020.) The AWS blog post announcing the new windowing feature for Lambda is dated 15 DEC 2020.
upvoted 2 times
...
Erso
2 years, 4 months ago
Selected Answer: D
in my opinion is D. time windowing , 1 GB of memory and a couple of minutes of execution...Glue is better
upvoted 2 times
...
Arjun777
2 years, 4 months ago
Lamda to handle tumbling window - data should be limited to 1MB These include: Window start and end: the beginning and ending timestamps for the current tumbling window. State: an object containing the state returned from the previous window, which is initially empty. The state object can contain up to 1 MB of data. isFinalInvokeForWindow: indicates if this is the last invocation for the tumbling window. This only occurs once per window period. isWindowTerminatedEarly: a window ends early only if the state exceeds the maximum allowed size of 1 MB. Therefore its D - As glue job with Pyspark can handle this volume of data aggregation by each bed.
upvoted 1 times
...
Chelseajcole
2 years, 4 months ago
Maybe the question is testing do you know PySpark can do windowing function. PySpark is also Python script. In term of cost effective, Glue is cheaper cpmpare to EMR. So overall, I vote for D.
upvoted 1 times
...
silvaa360
2 years, 5 months ago
Selected Answer: A
The process is already tested with python, so there is not a concern on the fact that we might need a PySpark job to work with dataframes, etc. Also it seems a bit overkill to set up a Spark job to process 2mb files. Again, if the quote saying that "the script is tested with python" was not present, I would choose PySpark or Scala. The Python mention might also be for us to choose PySpark over Scala, but it is not a easy question and I think both answers are quite right. I think it must be A
upvoted 3 times
...
thuyeinaung
2 years, 6 months ago
Selected Answer: A
A is way cost effective.
upvoted 2 times
...
cloudlearnerhere
2 years, 7 months ago
Go for A due to cost effectiveness as Glue is much expensive as compare to Lambda
upvoted 3 times
...
Hussben
2 years, 7 months ago
Selected Answer: A
The data size for every bed is 2 MB. In this case, Lambda should faster than Glub job
upvoted 1 times
...
Bansel
2 years, 7 months ago
A: This new feature introduces the concept of a tumbling window, which is a fixed-size, non-overlapping time interval of up to 15 minutes. To use this, you specify a tumbling window duration in the event-source mapping between the stream and the Lambda function. When you apply a tumbling window to a stream, items in the stream are grouped by window and sent to the processing Lambda function. The function returns a state value that is passed to the next invocation of the tumbling window. https://aws.amazon.com/blogs/compute/using-aws-lambda-for-streaming-analytics/
upvoted 1 times
...
rav009
2 years, 7 months ago
Selected Answer: D
D over A for tumbling window
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...