Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 155 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 155
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A gaming company is building a serverless data lake. The company is ingesting streaming data into Amazon Kinesis Data Streams and is writing the data to
Amazon S3 through Amazon Kinesis Data Firehose. The company is using 10 MB as the S3 buffer size and is using 90 seconds as the buffer interval. The company runs an AWS Glue ETL job to merge and transform the data to a different format before writing the data back to Amazon S3.
Recently, the company has experienced substantial growth in its data volume. The AWS Glue ETL jobs are frequently showing an OutOfMemoryError error.
Which solutions will resolve this issue without incurring additional costs? (Choose two.)

A. Place the small files into one S3 folder. Define one single table for the small S3 files in AWS Glue Data Catalog. Rerun the AWS Glue ETL jobs against this AWS Glue table.
B. Create an AWS Lambda function to merge small S3 files and invoke them periodically. Run the AWS Glue ETL jobs after successful completion of the Lambda function.
C. Run the S3DistCp utility in Amazon EMR to merge a large number of small S3 files before running the AWS Glue ETL jobs.
D. Use the groupFiles setting in the AWS Glue ETL job to merge small S3 files and rerun AWS Glue ETL jobs.
E. Update the Kinesis Data Firehose S3 buffer size to 128 MB. Update the buffer interval to 900 seconds.

Show Suggested Answer

Suggested Answer: DE 🗳️

by CHRIS12722222 at April 23, 2022, 7:06 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

pk349

Highly Voted 2 years, 1 month ago

DE: I passed the test

upvoted 6 times

...

Chelseajcole

Most Recent 2 years, 5 months ago

The buffer sizes hints range from 1 MbB to 128 MbB for Amazon S3 delivery. For Amazon OpenSearch Service (OpenSearch Service) delivery, they range from 1 MB to 100 MB. For AWS Lambda processing, you can set a buffering hint between 0.2 MB and up to 3 MB using the BufferSizeInMBs processor parameter. The size threshold is applied to the buffer before compression. These options are treated as hints. Kinesis Data Firehose might choose to use different values when it is optimal. The buffer interval hints range from 60 seconds to 900 seconds.

upvoted 1 times

...

MultiCloudIronMan

2 years, 8 months ago

DE see extract from AWS "You can configure the values for Amazon S3 Buffer size (1–128 MB) or Buffer interval (60–900 seconds). The condition satisfied first triggers data delivery to Amazon S3. When data delivery to the destination falls behind data writing to the delivery stream, Kinesis Data Firehose raises the buffer size dynamically."

upvoted 1 times

JoellaLi

2 years, 7 months ago

Link: https://docs.aws.amazon.com/firehose/latest/dev/basic-deliver.html

upvoted 1 times

...

jazzok

2 years, 8 months ago

DE. A – Won’t resolve this issue. B – Lambda has additional cost. C – S3DistCp has an additional cost. D – It’s covered in Glue ETL cost. E – Setting won’t have an additional cost.

upvoted 3 times

...

dushmantha

2 years, 10 months ago

Selected Answer: BD

I doubt "E" is an answer, because max buffer time of KDF is 5 mins. I would go with BD

upvoted 1 times

mawsman

2 years, 3 months ago

DE - KDF max buffer time is 900 seconds (15 mins) https://aws.amazon.com/kinesis/data-firehose/faqs/#:~:text=Kinesis%20Data%20Firehose%20buffers%20incoming,data%20delivery%20to%20Amazon%20S3.

upvoted 2 times

...

rocky48

2 years, 11 months ago

Selected Answer: DE

upvoted 2 times

...

ru4aws

2 years, 11 months ago

Selected Answer: DE

Grouping files together reduces the memory footprint on the Spark driver as well as simplifying file split orchestration. Buffer size increase avoids creating small size files

upvoted 2 times

...

Ramshizzle

3 years ago

Another good article discussing Out of memory issue; https://aws.amazon.com/premiumsupport/knowledge-center/glue-oom-java-heap-space-error/

upvoted 1 times

...

khchan123

3 years ago

Selected Answer: DE

DE. Grouping files together reduces the memory footprint on the Spark driver as well as simplifying file split orchestration. Without grouping, a Spark application must process each file using a different Spark task. Each task must then send mapStatus object containing the location information to the Spark driver. https://aws.amazon.com/blogs/big-data/optimize-memory-management-in-aws-glue/

upvoted 2 times

...

jrheen

3 years, 1 month ago

Answer-D,E

upvoted 1 times

...

Teraxs

3 years, 1 month ago

Selected Answer: DE

buffer increase increases file size https://aws.amazon.com/kinesis/data-firehose/faqs/?nc1=h_ls groupFiles helps reading small files https://docs.aws.amazon.com/glue/latest/dg/grouping-input-files.html

upvoted 3 times

...

CHRIS12722222

3 years, 1 month ago

DE seems good

upvoted 1 times

...