Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 90 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 90
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A company hosts an on-premises PostgreSQL database that contains historical data. An internal legacy application uses the database for read-only activities. The company's business team wants to move the data to a data lake in Amazon S3 as soon as possible and enrich the data for analytics.
The company has set up an AWS Direct Connect connection between its VPC and its on-premises network. A data analytics specialist must design a solution that achieves the business team's goals with the least operational overhead.
Which solution meets these requirements?

A. Upload the data from the on-premises PostgreSQL database to Amazon S3 by using a customized batch upload process. Use the AWS Glue crawler to catalog the data in Amazon S3. Use an AWS Glue job to enrich and store the result in a separate S3 bucket in Apache Parquet format. Use Amazon Athena to query the data.
B. Create an Amazon RDS for PostgreSQL database and use AWS Database Migration Service (AWS DMS) to migrate the data into Amazon RDS. Use AWS Data Pipeline to copy and enrich the data from the Amazon RDS for PostgreSQL table and move the data to Amazon S3. Use Amazon Athena to query the data.
C. Configure an AWS Glue crawler to use a JDBC connection to catalog the data in the on-premises database. Use an AWS Glue job to enrich the data and save the result to Amazon S3 in Apache Parquet format. Create an Amazon Redshift cluster and use Amazon Redshift Spectrum to query the data.
D. Configure an AWS Glue crawler to use a JDBC connection to catalog the data in the on-premises database. Use an AWS Glue job to enrich the data and save the result to Amazon S3 in Apache Parquet format. Use Amazon Athena to query the data.

Show Suggested Answer

Suggested Answer: D 🗳️

by VikG12 at May 3, 2021, 6:01 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

asg76

Highly Voted 3 years, 10 months ago

The question says least operational overhead...so it should be D

upvoted 30 times

...

Donell

Highly Voted 3 years, 9 months ago

Answer is D AWS Glue can communicate with an on-premises data store over VPN or DX connectivity. An AWS Glue crawler uses an S3 or JDBC connection to catalog the data source, and the AWS Glue ETL job uses S3 or JDBC connections as a source or target data store. https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/

upvoted 18 times

Dr_Kiko

3 years, 9 months ago

that blog post literally says "In this post, I describe a solution for transforming and moving data from an on-premises data store to Amazon S3 using AWS Glue"

upvoted 5 times

...

zzhangi0520

Most Recent 1 year, 7 months ago

Selected Answer: B

D is wrong, the requirement is "move the data to a data lake". D doesn't store the source data, but only "save the result to Amazon S3".

upvoted 2 times

...

MLCL

2 years ago

Selected Answer: B

I would go with B because of the requirement for minimum overhead. D is also correct but needs more work and services involved, we are using DMS for a data migration scenario and datapipeline for a transformation and movement scenario, it makes sense.

upvoted 2 times

...

whenthan

2 years ago

Selected Answer: D

AWS Glue can also connect to a variety of on-premises JDBC data stores such as PostgreSQL, MySQL, Oracle, Microsoft SQL Server, and MariaDB. AWS Glue ETL jobs can use Amazon S3, data stores in a VPC, or on-premises JDBC data stores as a source. AWS Glue jobs extract data, transform it, and load the resulting data back to S3, data stores in a VPC, or on-premises JDBC data stores as a target.

upvoted 1 times

...

ccpmad

2 years, 1 month ago

pk349 user is stupid. In all questions posts X: I passed the test. Stupid.

upvoted 9 times

...

cloudlearnerhere

2 years, 9 months ago

Correct answer is D as AWS Glue can be used to extract from the on-premises data store using JDBC connection, transform data and store the data in S3 with the least operational overhead. Correct answer is D as AWS Glue can be used to extract from the on-premises data store using JDBC connection, transform data and store the data in S3 with the least operational overhead. Option A is wrong as using a customized batch upload process would add to the operational overhead. Option B is wrong as creating an intermediate RDS instance with Data Pipeline jobs added to the operational overhead. Option C is wrong as creating a Redshift cluster would add to the operational overhead.

upvoted 2 times

...

Hussben

2 years, 9 months ago

Selected Answer: D

I think B is not correct. If the goal is to migrate this database to RDS, then is B. But we want to move it to S3, AWS Glue can do ETL. It is less overhead

upvoted 1 times

...

rocky48

3 years, 1 month ago

Selected Answer: D

Answer is D

upvoted 1 times

...

dushmantha

3 years, 2 months ago

Selected Answer: B

I think most of us missing the point that we should understand what each solution is designed for. Clearly this is a database migration task so in that case I would defiinetely use DMS, coz its optimized for such tasks. Considering that I would say this is not and ETL task so that Glue ETL isn't an option. Moreover Glue ETL uses "serverless spark platform" and therefore will be expensive for sure. Therefore I go with option B.

upvoted 4 times

...

certificationJunkie

3 years, 2 months ago

C is the correct answer. For any form of analytics, RedShift is a preferred choice. Athena use case is for AdHoc query.

upvoted 3 times

allanm

2 years, 11 months ago

Redshift and redshift spectrum adds significantly more operational overhead!

upvoted 1 times

...

RSSRAO

3 years, 6 months ago

Selected Answer: D

Explanation Correct answer is D as AWS Glue can be used to extract from the on-premises data store using JDBC connection, transform data and store the data in S3 with the least operational overhead. Refer AWS documentation - Glue Analyze On-premises Data Store (https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/) AWS Glue ETL jobs can use Amazon S3, data stores in a VPC, or on-premises JDBC data stores as a source. AWS Glue jobs extract data, transform it, and load the resulting data back to S3, data stores in a VPC, or on-premises JDBC data stores as a target. Option A is wrong as using a customized batch upload process would add to the operational overhead. Option B is wrong as creating an intermediate RDS instance with Data Pipeline jobs added to the operational overhead. Option C is wrong as creating a Redshift cluster would add to the operational overhead.

upvoted 4 times

...

penelop

3 years, 6 months ago

Selected Answer: D

We want to reduce operational costs and overhead. A - Sounds good but the beginning is not right, we can do better than customized batch. B - That's too much work, we are not interested in keeping the PSQL Engine, so if we can remove that's step from the equation we are good to go. C - We directly get the table from the On-prem DB with Glue and move it directly to S3. That's good, but we are adding spectrum to the mix, which is a more expensive solution. D - Yes, we are moving directly from on-prem to S3, but using athena to query the data. This is the response.

upvoted 4 times

...

TerrancePythonJava

3 years, 6 months ago

Selected Answer: B

I believe answer is 'B'

upvoted 1 times

...

aws2019

3 years, 9 months ago

It is close, but I am leaning towards "D"

upvoted 2 times

...

Donell

3 years, 9 months ago

I believe answer is D. AWS Glue can communicate with an on-premises data store over VPN or DX connectivity. An AWS Glue crawler uses an S3 or JDBC connection to catalog the data source, and the AWS Glue ETL job uses S3 or JDBC connections as a source or target data store. https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/

upvoted 3 times

...

asg76

3 years, 10 months ago

The question asks to achieve the goals with "least operational overhead". Answer is D....B cannot be the answer as DMS require EC2 instance to be created for the replication task, this is an operational overhead. Same for A, customized batch process will be an operational overhead. C is not an option as you need to maintain a redshift cluster.

upvoted 2 times

...

Load full discussion...