Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 52 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 52
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A financial services company is building a robust serverless data lake on Amazon S3. The data lake should be flexible and meet the following requirements:
✑ Support querying old and new data on Amazon S3 through Amazon Athena and Amazon Redshift Spectrum.
✑ Support event-driven ETL pipelines
✑ Provide a quick and easy way to understand metadata
Which approach meets these requirements?

A. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Glue ETL job, and an AWS Glue Data catalog to search and discover metadata.
B. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Batch job, and an external Apache Hive metastore to search and discover metadata.
C. Use an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Batch job, and an AWS Glue Data Catalog to search and discover metadata.
D. Use an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Glue ETL job, and an external Apache Hive metastore to search and discover metadata.

Show Suggested Answer

Suggested Answer: A 🗳️

by DonaldCMLIN at Nov. 17, 2019, 3:26 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

DonaldCMLIN

Highly Voted 2 years, 9 months ago

BOTH A AND B ARE ANSWERS. BUT external Apache Hive MIGHT BE NOT SERVERLESS SOLUTION. The AWS Glue Data Catalog is your persistent metadata store. It is a managed service that lets you store, annotate, and share metadata in the AWS Cloud in the same way you would in an Apache Hive metastore. The Data Catalog is a drop-in replacement for the Apache Hive Metastore https://docs.aws.amazon.com/zh_tw/glue/latest/dg/components-overview.html BEAUTIFUL ANSWER IS A.

upvoted 45 times

rsimham

2 years, 9 months ago

I am thinking about Answer C, because events can be triggered by cloudwatch w/Glue metastore

upvoted 1 times

qwerty456

2 years, 8 months ago

you can't schedule AWS Batch with CloudWatch

upvoted 4 times

kalyanvarma

2 years, 7 months ago

We can schedule batch with cloud watch events.

upvoted 1 times

...

qwerty456

2 years, 8 months ago

srr, looks like you can apart from Cron, the argument should be AWS Batch aren't SERVERLESS

upvoted 2 times

...

ComPah

2 years, 9 months ago

if we use Flexible as key word ..Using Lambda might be a constraint

upvoted 4 times

...

cybe001

Highly Voted 2 years, 9 months ago

Answer is A. Lamda is the preferred way of implementing event-driven ETL job with S3, when new data arrives in S3, it notifies lamda which can start the ETL job.

upvoted 18 times

rb39

1 year, 9 months ago

agree, event-driven means Lambda, CloudWatch alarms are just to trigger alarms based on log analysis.

upvoted 3 times

...

loict

Most Recent 9 months, 2 weeks ago

Selected Answer: A

A. YES - all integrated components B. NO - missing a component to invoke the Lambda C. NO - CloudWatch will not trigger when there is a new file to process D. NO - CloudWatch will not trigger when there is a new file to process

upvoted 2 times

...

Mickey321

10 months, 1 week ago

Selected Answer: A

A for me

upvoted 1 times

...

kaike_reis

11 months ago

Selected Answer: A

Note that the question asks for a serverless system. In this case, the letters B, C and D are wrong, as they bring options that are managed: AWS Batch (managed) and external Apache Hive (even more managed). For event-driven AWS ETL solutions that are serverless, activation through the Lambda function is recommended, so the correct alternative is Letter A. Note that CloudWatch Alarms only activates from log evaluation, which is not mentioned in the question.

upvoted 1 times

...

jackzhao

1 year, 3 months ago

I will chose A, I think C & D is wrong, you can use Amazon CloudWatch Event to trigger lambda but not CloudWatch alarm.

upvoted 1 times

...

Valcilio

1 year, 3 months ago

Selected Answer: A

Batch is more for configurations and other kinds of things by scheduling than event driven and batch data processing with ETL, the answer is A.

upvoted 1 times

...

Jeremy1

1 year, 7 months ago

Selected Answer: A

Found this supporting A - Lambda used to trigger ETL job after crawler completes. The crawler starts on schedules or events (files arriving).

upvoted 1 times

...

Skychaser

1 year, 11 months ago

Selected Answer: A

Based on Majority discussion

upvoted 2 times

...

exam887

2 years, 1 month ago

Selected Answer: C

Quite confused between A&C since they all workable solution. In below AWS Blog, even mix the CloudWatch + Lambda to use the Glue. For key word event trigger, prefer CloudWatch https://aws.amazon.com/blogs/big-data/build-and-automate-a-serverless-data-lake-using-an-aws-glue-trigger-for-the-data-catalog-and-etl-jobs/ https://docs.aws.amazon.com/glue/latest/dg/automating-awsglue-with-cloudwatch-events.html

upvoted 2 times

ZSun

1 year, 2 months ago

cloudwatch and lambda function can work together to trigger event. But AWS batch cannot independently conduct ETL and require other service. when it comes to ETL, glue is much easier choice than Batch

upvoted 1 times

...