exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 155 discussion

A gaming company is building a serverless data lake. The company is ingesting streaming data into Amazon Kinesis Data Streams and is writing the data to
Amazon S3 through Amazon Kinesis Data Firehose. The company is using 10 MB as the S3 buffer size and is using 90 seconds as the buffer interval. The company runs an AWS Glue ETL job to merge and transform the data to a different format before writing the data back to Amazon S3.
Recently, the company has experienced substantial growth in its data volume. The AWS Glue ETL jobs are frequently showing an OutOfMemoryError error.
Which solutions will resolve this issue without incurring additional costs? (Choose two.)

  • A. Place the small files into one S3 folder. Define one single table for the small S3 files in AWS Glue Data Catalog. Rerun the AWS Glue ETL jobs against this AWS Glue table.
  • B. Create an AWS Lambda function to merge small S3 files and invoke them periodically. Run the AWS Glue ETL jobs after successful completion of the Lambda function.
  • C. Run the S3DistCp utility in Amazon EMR to merge a large number of small S3 files before running the AWS Glue ETL jobs.
  • D. Use the groupFiles setting in the AWS Glue ETL job to merge small S3 files and rerun AWS Glue ETL jobs.
  • E. Update the Kinesis Data Firehose S3 buffer size to 128 MB. Update the buffer interval to 900 seconds.
Show Suggested Answer Hide Answer
Suggested Answer: DE 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
pk349
Highly Voted 2 years ago
DE: I passed the test
upvoted 6 times
...
Chelseajcole
Most Recent 2 years, 3 months ago
The buffer sizes hints range from 1 MbB to 128 MbB for Amazon S3 delivery. For Amazon OpenSearch Service (OpenSearch Service) delivery, they range from 1 MB to 100 MB. For AWS Lambda processing, you can set a buffering hint between 0.2 MB and up to 3 MB using the BufferSizeInMBs processor parameter. The size threshold is applied to the buffer before compression. These options are treated as hints. Kinesis Data Firehose might choose to use different values when it is optimal. The buffer interval hints range from 60 seconds to 900 seconds.
upvoted 1 times
...
MultiCloudIronMan
2 years, 6 months ago
DE see extract from AWS "You can configure the values for Amazon S3 Buffer size (1–128 MB) or Buffer interval (60–900 seconds). The condition satisfied first triggers data delivery to Amazon S3. When data delivery to the destination falls behind data writing to the delivery stream, Kinesis Data Firehose raises the buffer size dynamically."
upvoted 1 times
JoellaLi
2 years, 6 months ago
Link: https://docs.aws.amazon.com/firehose/latest/dev/basic-deliver.html
upvoted 1 times
...
...
jazzok
2 years, 6 months ago
DE. A – Won’t resolve this issue. B – Lambda has additional cost. C – S3DistCp has an additional cost. D – It’s covered in Glue ETL cost. E – Setting won’t have an additional cost.
upvoted 3 times
...
dushmantha
2 years, 8 months ago
Selected Answer: BD
I doubt "E" is an answer, because max buffer time of KDF is 5 mins. I would go with BD
upvoted 1 times
mawsman
2 years, 1 month ago
DE - KDF max buffer time is 900 seconds (15 mins) https://aws.amazon.com/kinesis/data-firehose/faqs/#:~:text=Kinesis%20Data%20Firehose%20buffers%20incoming,data%20delivery%20to%20Amazon%20S3.
upvoted 2 times
...
...
rocky48
2 years, 9 months ago
Selected Answer: DE
Selected Answer: DE
upvoted 2 times
...
ru4aws
2 years, 9 months ago
Selected Answer: DE
Grouping files together reduces the memory footprint on the Spark driver as well as simplifying file split orchestration. Buffer size increase avoids creating small size files
upvoted 2 times
...
Ramshizzle
2 years, 10 months ago
Another good article discussing Out of memory issue; https://aws.amazon.com/premiumsupport/knowledge-center/glue-oom-java-heap-space-error/
upvoted 1 times
...
khchan123
2 years, 11 months ago
Selected Answer: DE
DE. Grouping files together reduces the memory footprint on the Spark driver as well as simplifying file split orchestration. Without grouping, a Spark application must process each file using a different Spark task. Each task must then send mapStatus object containing the location information to the Spark driver. https://aws.amazon.com/blogs/big-data/optimize-memory-management-in-aws-glue/
upvoted 2 times
...
jrheen
3 years ago
Answer-D,E
upvoted 1 times
...
Teraxs
3 years ago
Selected Answer: DE
buffer increase increases file size https://aws.amazon.com/kinesis/data-firehose/faqs/?nc1=h_ls groupFiles helps reading small files https://docs.aws.amazon.com/glue/latest/dg/grouping-input-files.html
upvoted 3 times
...
CHRIS12722222
3 years ago
DE seems good
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago