exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 90 discussion

A company hosts an on-premises PostgreSQL database that contains historical data. An internal legacy application uses the database for read-only activities. The company's business team wants to move the data to a data lake in Amazon S3 as soon as possible and enrich the data for analytics.
The company has set up an AWS Direct Connect connection between its VPC and its on-premises network. A data analytics specialist must design a solution that achieves the business team's goals with the least operational overhead.
Which solution meets these requirements?

  • A. Upload the data from the on-premises PostgreSQL database to Amazon S3 by using a customized batch upload process. Use the AWS Glue crawler to catalog the data in Amazon S3. Use an AWS Glue job to enrich and store the result in a separate S3 bucket in Apache Parquet format. Use Amazon Athena to query the data.
  • B. Create an Amazon RDS for PostgreSQL database and use AWS Database Migration Service (AWS DMS) to migrate the data into Amazon RDS. Use AWS Data Pipeline to copy and enrich the data from the Amazon RDS for PostgreSQL table and move the data to Amazon S3. Use Amazon Athena to query the data.
  • C. Configure an AWS Glue crawler to use a JDBC connection to catalog the data in the on-premises database. Use an AWS Glue job to enrich the data and save the result to Amazon S3 in Apache Parquet format. Create an Amazon Redshift cluster and use Amazon Redshift Spectrum to query the data.
  • D. Configure an AWS Glue crawler to use a JDBC connection to catalog the data in the on-premises database. Use an AWS Glue job to enrich the data and save the result to Amazon S3 in Apache Parquet format. Use Amazon Athena to query the data.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
asg76
Highly Voted 3 years, 8 months ago
The question says least operational overhead...so it should be D
upvoted 30 times
...
Donell
Highly Voted 3 years, 7 months ago
Answer is D AWS Glue can communicate with an on-premises data store over VPN or DX connectivity. An AWS Glue crawler uses an S3 or JDBC connection to catalog the data source, and the AWS Glue ETL job uses S3 or JDBC connections as a source or target data store. https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/
upvoted 18 times
Dr_Kiko
3 years, 7 months ago
that blog post literally says "In this post, I describe a solution for transforming and moving data from an on-premises data store to Amazon S3 using AWS Glue"
upvoted 5 times
...
...
zzhangi0520
Most Recent 1 year, 5 months ago
Selected Answer: B
D is wrong, the requirement is "move the data to a data lake". D doesn't store the source data, but only "save the result to Amazon S3".
upvoted 2 times
...
MLCL
1 year, 10 months ago
Selected Answer: B
I would go with B because of the requirement for minimum overhead. D is also correct but needs more work and services involved, we are using DMS for a data migration scenario and datapipeline for a transformation and movement scenario, it makes sense.
upvoted 2 times
...
whenthan
1 year, 10 months ago
Selected Answer: D
AWS Glue can also connect to a variety of on-premises JDBC data stores such as PostgreSQL, MySQL, Oracle, Microsoft SQL Server, and MariaDB. AWS Glue ETL jobs can use Amazon S3, data stores in a VPC, or on-premises JDBC data stores as a source. AWS Glue jobs extract data, transform it, and load the resulting data back to S3, data stores in a VPC, or on-premises JDBC data stores as a target.
upvoted 1 times
...
ccpmad
1 year, 11 months ago
pk349 user is stupid. In all questions posts X: I passed the test. Stupid.
upvoted 9 times
...
cloudlearnerhere
2 years, 7 months ago
Correct answer is D as AWS Glue can be used to extract from the on-premises data store using JDBC connection, transform data and store the data in S3 with the least operational overhead. Correct answer is D as AWS Glue can be used to extract from the on-premises data store using JDBC connection, transform data and store the data in S3 with the least operational overhead. Option A is wrong as using a customized batch upload process would add to the operational overhead. Option B is wrong as creating an intermediate RDS instance with Data Pipeline jobs added to the operational overhead. Option C is wrong as creating a Redshift cluster would add to the operational overhead.
upvoted 2 times
...
Hussben
2 years, 7 months ago
Selected Answer: D
I think B is not correct. If the goal is to migrate this database to RDS, then is B. But we want to move it to S3, AWS Glue can do ETL. It is less overhead
upvoted 1 times
...
rocky48
2 years, 11 months ago
Selected Answer: D
Answer is D
upvoted 1 times
...
dushmantha
3 years ago
Selected Answer: B
I think most of us missing the point that we should understand what each solution is designed for. Clearly this is a database migration task so in that case I would defiinetely use DMS, coz its optimized for such tasks. Considering that I would say this is not and ETL task so that Glue ETL isn't an option. Moreover Glue ETL uses "serverless spark platform" and therefore will be expensive for sure. Therefore I go with option B.
upvoted 4 times
...
C is the correct answer. For any form of analytics, RedShift is a preferred choice. Athena use case is for AdHoc query.
upvoted 3 times
allanm
2 years, 9 months ago
Redshift and redshift spectrum adds significantly more operational overhead!
upvoted 1 times
...
...
RSSRAO
3 years, 4 months ago
Selected Answer: D
Explanation Correct answer is D as AWS Glue can be used to extract from the on-premises data store using JDBC connection, transform data and store the data in S3 with the least operational overhead. Refer AWS documentation - Glue Analyze On-premises Data Store (https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/) AWS Glue ETL jobs can use Amazon S3, data stores in a VPC, or on-premises JDBC data stores as a source. AWS Glue jobs extract data, transform it, and load the resulting data back to S3, data stores in a VPC, or on-premises JDBC data stores as a target. Option A is wrong as using a customized batch upload process would add to the operational overhead. Option B is wrong as creating an intermediate RDS instance with Data Pipeline jobs added to the operational overhead. Option C is wrong as creating a Redshift cluster would add to the operational overhead.
upvoted 4 times
...
penelop
3 years, 4 months ago
Selected Answer: D
We want to reduce operational costs and overhead. A - Sounds good but the beginning is not right, we can do better than customized batch. B - That's too much work, we are not interested in keeping the PSQL Engine, so if we can remove that's step from the equation we are good to go. C - We directly get the table from the On-prem DB with Glue and move it directly to S3. That's good, but we are adding spectrum to the mix, which is a more expensive solution. D - Yes, we are moving directly from on-prem to S3, but using athena to query the data. This is the response.
upvoted 4 times
...
TerrancePythonJava
3 years, 4 months ago
Selected Answer: B
I believe answer is 'B'
upvoted 1 times
...
aws2019
3 years, 7 months ago
It is close, but I am leaning towards "D"
upvoted 2 times
...
Donell
3 years, 7 months ago
I believe answer is D. AWS Glue can communicate with an on-premises data store over VPN or DX connectivity. An AWS Glue crawler uses an S3 or JDBC connection to catalog the data source, and the AWS Glue ETL job uses S3 or JDBC connections as a source or target data store. https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/
upvoted 3 times
...
asg76
3 years, 8 months ago
The question asks to achieve the goals with "least operational overhead". Answer is D....B cannot be the answer as DMS require EC2 instance to be created for the replication task, this is an operational overhead. Same for A, customized batch process will be an operational overhead. C is not an option as you need to maintain a redshift cluster.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...