Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 22 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 22
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A company is planning to create a data lake in Amazon S3. The company wants to create tiered storage based on access patterns and cost objectives. The solution must include support for JDBC connections from legacy clients, metadata management that allows federation for access control, and batch-based ETL using PySpark and Scala. Operational management should be limited.
Which combination of components can meet these requirements? (Choose three.)

A. AWS Glue Data Catalog for metadata management
B. Amazon EMR with Apache Spark for ETL
C. AWS Glue for Scala-based ETL
D. Amazon EMR with Apache Hive for JDBC clients
E. Amazon Athena for querying data in Amazon S3 using JDBC drivers
F. Amazon EMR with Apache Hive, using an Amazon RDS with MySQL-compatible backed metastore

Show Suggested Answer

Suggested Answer: ACE 🗳️

by testtaker3434 at Aug. 9, 2020, 8:36 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Prodip

Highly Voted 3 years, 9 months ago

I will go with A,C,E . Glue can do both pyspark and scala based ETL. Glue for Metadata and JDBC drivers to connect Athena from outside of AWS. Server less . so, Operational management is limited

upvoted 48 times

abhineet

3 years, 9 months ago

ya i thought so too, ACE for me

upvoted 4 times

...

jack42

Highly Voted 3 years, 8 months ago

Each word has meaning, So I will go with ABD, A metadata management that allows federation for access control, B- batch-based ETL using PySpark, D-JDBC connections from legacy clients. Not-C- because it mentioned only scala but questions mentioned scala operation is limited, E- you need JDBC to connect clinet not the Athena

upvoted 6 times

Mahesh22

3 years, 8 months ago

Correct. ABD is right

upvoted 1 times

...

vanireddy

3 years, 8 months ago

I agree with this. Correct is ABD.

upvoted 1 times

...

shammous

2 years, 7 months ago

EMR=Operationd overhead. ETL does it all and it is a managed service. ACE is better answer

upvoted 1 times

shammous

2 years, 7 months ago

I mean AWS Glue (not ETL) is a serverless service and you don't need to provision it.

upvoted 1 times

...

abgz887

2 years, 8 months ago

if we select B,D (EMR-spark,Hive-jdbc),does it not make more sense to use Emr-Hive-datastore(F),instead of glue-catalog(A),limiting operational management. - making BDF more appropriate.

upvoted 1 times

...

tsangckl

Most Recent 1 year, 3 months ago

Bing Option A is correct because AWS Glue Data Catalog provides a unified metadata repository across a variety of data sources and data formats, and it integrates with Amazon S3, Amazon RDS, Amazon Athena, Amazon Redshift, and others. Option B is correct because Amazon EMR with Apache Spark supports PySpark and Scala for batch-based ETL processing. Option E is correct because Amazon Athena supports SQL queries and can be integrated with JDBC drivers, allowing legacy clients to execute queries.

upvoted 1 times

...

NarenKA

1 year, 4 months ago

Selected Answer: ABE

I will go with A. AWS Glue Data Catalog, B.Amazon EMR with Apache Spark, E. Amazon Athena aligns well with the company's requirements for a data lake architecture, offering a balance of performance, cost-efficiency, and ease of management. While C is also a viable option for ETL processes, it's more aligned with serverless ETL jobs and might not be as flexible for Scala as Amazon EMR with Apache Spark. D and F could provide JDBC connectivity and metadata management but are more operationally intensive and less integrated with S3 tiered storage strategies compared to using Athena with the Glue Data Catalog.

upvoted 1 times

...

geekfrosty

1 year, 10 months ago

Why are we saying C ? C just says "Scala" ETL, even though Glue supports both pyspark and scala and AWS managed, the option specifically mentions "Scala based". Requirement is for both Scala and Pyspark that directly points to EMR. answer should be ABE.. about operational management, it says "limited", and EMR can qualify with it. using glue there is 'no' operational overhead.

upvoted 1 times

...

NikkyDicky

1 year, 11 months ago

Selected Answer: ACE

ACE it

upvoted 1 times

...

pk349

2 years, 2 months ago

ACE: I passed the test

upvoted 3 times

...

cloudlearnerhere

2 years, 8 months ago

Selected Answer: ACE

Correct answers are A, C & E Option A as Glue Data Catalog provides metadata management with the federation for access control. Option C as AWS Glue supports both serverless PySpark and Scala-based ETL with the least operational overhead. Option E as Athena can be used for querying S3 data. Athena can be connected using JDBC drivers from the external legacy clients. Options B, D & E is wrong as using EMR and RDS would increase the operational management and cost.

upvoted 5 times

...

Arka_01

2 years, 9 months ago

Selected Answer: ACE

As operation management should be less, so all EMR related options are invalid, as EMR needs management of underlying EC2 instances

upvoted 2 times

...

Abep

2 years, 10 months ago

Selected Answer: ACE

A. *Less* operational overhead compared to F (selected) B. High operational overhead, when compared to "C" AWS Glue based Scala C. *Less* operational overhead compared to "B" EMR PySpark (selected) D. Higher operational overhead when compared to "E" Athena. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/HiveJDBCDriver.html E. *Less* operational overhead when compared to "D" EMR Hive JDBC (selected) F. Higher operational overhead when compared to "A" Glue metadata

upvoted 2 times

...

rocky48

2 years, 11 months ago

Selected Answer: ACE

I will go with A,C,E

upvoted 2 times

...

GarfieldBin

3 years ago

Selected Answer: ABC

D and F are wrong because the question never mentions Hive. E is not right, since Athena don't need JDBC to query S3. C is right because AWS Glue can be used for Scala-based ETL. A is right because Glue can connect on-premises DB through JDBC. https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/. B is right because Apache Spark can support PySpark.

upvoted 1 times

...

Bik000

3 years, 1 month ago

Selected Answer: ACE

My Answer is A, C & E

upvoted 3 times

...

Japanese1

3 years, 4 months ago

D, F are clearly wrong. D : JDBC connection from older clients is required, NOT from Athena. F : It is redundant to use RDS as a metastore. And there is no requirement for MySQL-compatible backed metastore. I'm torn between B and C. B is predominant in terms of cost constraints, but I am doubtful.

upvoted 1 times

...

Donell

3 years, 8 months ago

Answer: A,C,E EMR has operational overhead.

upvoted 2 times

...

Donell

3 years, 8 months ago

Answer: A,C,E EMR has operational overhead.

upvoted 3 times

...

Shraddha

3 years, 8 months ago

Ans - ACE Note: This is a free score question. Anything EMR comparing to serverless Glue / Athena is operational overhead. Also remember Glue can do PySpark and Scala, and Athena can do JDBC.

upvoted 4 times

...

Load full discussion...