exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 55 discussion

A media company wants to perform machine learning and analytics on the data residing in its Amazon S3 data lake. There are two data transformation requirements that will enable the consumers within the company to create reports:
✑ Daily transformations of 300 GB of data with different file formats landing in Amazon S3 at a scheduled time.
✑ One-time transformations of terabytes of archived data residing in the S3 data lake.
Which combination of solutions cost-effectively meets the company's requirements for transforming the data? (Choose three.)

  • A. For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
  • B. For daily incoming data, use Amazon Athena to scan and identify the schema.
  • C. For daily incoming data, use Amazon Redshift to perform transformations.
  • D. For daily incoming data, use AWS Glue workflows with AWS Glue jobs to perform transformations.
  • E. For archived data, use Amazon EMR to perform data transformations.
  • F. For archived data, use Amazon SageMaker to perform data transformations.
Show Suggested Answer Hide Answer
Suggested Answer: ADE 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
testtaker3434
Highly Voted 3 years, 8 months ago
To me, ADE. Not B. Athena will use Glue (option A) Not C. Its an antipattern to use Redshift to do transformations. Not F. Would pick EMR instead of Sagemaker to do one time transformations
upvoted 46 times
awssp12345
3 years, 8 months ago
Agreed
upvoted 3 times
...
...
zeronine
Highly Voted 3 years, 8 months ago
My answer is ADE.
upvoted 9 times
...
pk349
Most Recent 2 years, 1 month ago
ADE: I passed the test
upvoted 1 times
...
cloudlearnerhere
2 years, 6 months ago
Selected Answer: ADE
Correct answers are A, D & E Options A & D using Glue Crawler and Glue Workflows would provide ETL for daily transactions. Option E as EMR can help perform data transformation for archived data. Option B is wrong as Athena does not identify the schema but uses Glue Catalog. Option C is wrong as Redshift would need to be persistent and does not provide a cost-effective solution as compared to Glue. Option F is wrong as Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. It does not provide ETL capability on large data.
upvoted 6 times
sevensquare
2 years, 6 months ago
What about SageMaker Data Wrangler?
upvoted 1 times
...
...
Arka_01
2 years, 8 months ago
Selected Answer: ADE
cost-effectively solution is required. So, A, D and E.
upvoted 1 times
...
rocky48
2 years, 9 months ago
Selected Answer: ADE
Selected Answer: ADE
upvoted 1 times
...
girish123456
2 years, 10 months ago
Selected Answer: ADE
A: For schema and new partition of data for Incremental load D: Incremental transformation E: Historical data migration using EMR
upvoted 2 times
...
GiveMeEz
2 years, 11 months ago
sorry, for the 41 upvotes. Ans A can't be it. Athena doesn't scan and identify schema. Athena use the Glue Data Catalog, which is generated by Glue Crawler. My answer: A,D,E.
upvoted 2 times
...
aws2019
3 years, 6 months ago
ADE is ans
upvoted 1 times
...
lostsoul07
3 years, 7 months ago
A, D, E is the right answer
upvoted 5 times
...
Draco31
3 years, 7 months ago
Yep ADE also. I guess SageMaker will use the data more than 1 time for learning processes
upvoted 3 times
...
sanjaym
3 years, 7 months ago
100% ADE
upvoted 5 times
...
syu31svc
3 years, 7 months ago
Notice that the answers given are paired so if you were to break it down: Identify schema --> Glue Transformations --> Glue Jobs Archived TBs worth of data --> EMR So is ADE
upvoted 5 times
...
Paitan
3 years, 7 months ago
A, D and E.
upvoted 5 times
...
manish9363
3 years, 8 months ago
can glue handle 300GB data every day? It seems too much for glue.
upvoted 1 times
GiveMeEz
2 years, 11 months ago
glue can. you can properly size the glue cluster for the glue job with one simple dial.
upvoted 1 times
...
Phoenyx89
3 years, 7 months ago
Absolutely! Glue can handle same amount of data as EMR because in the end Glue is a simplified EMR cluster with Spark, HDFS, YARN and the Glue dependencies but have the advantage of being serverless. Configuring the appropriate amount and type of DPUs you can handle 300GB of data
upvoted 8 times
omar_bahrain
3 years, 7 months ago
Thanks for describing the origion of EMR
upvoted 1 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...