Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 55 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 55
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A media company wants to perform machine learning and analytics on the data residing in its Amazon S3 data lake. There are two data transformation requirements that will enable the consumers within the company to create reports:
✑ Daily transformations of 300 GB of data with different file formats landing in Amazon S3 at a scheduled time.
✑ One-time transformations of terabytes of archived data residing in the S3 data lake.
Which combination of solutions cost-effectively meets the company's requirements for transforming the data? (Choose three.)

A. For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
B. For daily incoming data, use Amazon Athena to scan and identify the schema.
C. For daily incoming data, use Amazon Redshift to perform transformations.
D. For daily incoming data, use AWS Glue workflows with AWS Glue jobs to perform transformations.
E. For archived data, use Amazon EMR to perform data transformations.
F. For archived data, use Amazon SageMaker to perform data transformations.

Show Suggested Answer

Suggested Answer: ADE 🗳️

by testtaker3434 at Aug. 9, 2020, 2:24 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

testtaker3434

Highly Voted 3 years, 8 months ago

To me, ADE. Not B. Athena will use Glue (option A) Not C. Its an antipattern to use Redshift to do transformations. Not F. Would pick EMR instead of Sagemaker to do one time transformations

upvoted 46 times

awssp12345

3 years, 8 months ago

Agreed

upvoted 3 times

...

zeronine

Highly Voted 3 years, 8 months ago

My answer is ADE.

upvoted 9 times

...

pk349

Most Recent 2 years, 1 month ago

ADE: I passed the test

upvoted 1 times

...

cloudlearnerhere

2 years, 6 months ago

Selected Answer: ADE

Correct answers are A, D & E Options A & D using Glue Crawler and Glue Workflows would provide ETL for daily transactions. Option E as EMR can help perform data transformation for archived data. Option B is wrong as Athena does not identify the schema but uses Glue Catalog. Option C is wrong as Redshift would need to be persistent and does not provide a cost-effective solution as compared to Glue. Option F is wrong as Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. It does not provide ETL capability on large data.

upvoted 6 times

sevensquare

2 years, 6 months ago

What about SageMaker Data Wrangler?

upvoted 1 times

...

Arka_01

2 years, 8 months ago

Selected Answer: ADE

cost-effectively solution is required. So, A, D and E.

upvoted 1 times

...

rocky48

2 years, 9 months ago

Selected Answer: ADE

Selected Answer: ADE

upvoted 1 times

...

girish123456

2 years, 10 months ago

Selected Answer: ADE

A: For schema and new partition of data for Incremental load D: Incremental transformation E: Historical data migration using EMR

upvoted 2 times

...

GiveMeEz

2 years, 11 months ago

sorry, for the 41 upvotes. Ans A can't be it. Athena doesn't scan and identify schema. Athena use the Glue Data Catalog, which is generated by Glue Crawler. My answer: A,D,E.

upvoted 2 times

...

aws2019

3 years, 6 months ago

ADE is ans

upvoted 1 times

...

lostsoul07

3 years, 7 months ago

A, D, E is the right answer

upvoted 5 times

...

Draco31

3 years, 7 months ago

Yep ADE also. I guess SageMaker will use the data more than 1 time for learning processes

upvoted 3 times

...

sanjaym

3 years, 7 months ago

100% ADE

upvoted 5 times

...

syu31svc

3 years, 7 months ago

Notice that the answers given are paired so if you were to break it down: Identify schema --> Glue Transformations --> Glue Jobs Archived TBs worth of data --> EMR So is ADE

upvoted 5 times

...

Paitan

3 years, 7 months ago

A, D and E.

upvoted 5 times

...

manish9363

3 years, 8 months ago

can glue handle 300GB data every day? It seems too much for glue.

upvoted 1 times

GiveMeEz

2 years, 11 months ago

glue can. you can properly size the glue cluster for the glue job with one simple dial.

upvoted 1 times

...

Phoenyx89

3 years, 7 months ago

Absolutely! Glue can handle same amount of data as EMR because in the end Glue is a simplified EMR cluster with Spark, HDFS, YARN and the Glue dependencies but have the advantage of being serverless. Configuring the appropriate amount and type of DPUs you can handle 300GB of data

upvoted 8 times

omar_bahrain

3 years, 7 months ago

Thanks for describing the origion of EMR

upvoted 1 times

...