exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 22 discussion

A company is planning to create a data lake in Amazon S3. The company wants to create tiered storage based on access patterns and cost objectives. The solution must include support for JDBC connections from legacy clients, metadata management that allows federation for access control, and batch-based ETL using PySpark and Scala. Operational management should be limited.
Which combination of components can meet these requirements? (Choose three.)

  • A. AWS Glue Data Catalog for metadata management
  • B. Amazon EMR with Apache Spark for ETL
  • C. AWS Glue for Scala-based ETL
  • D. Amazon EMR with Apache Hive for JDBC clients
  • E. Amazon Athena for querying data in Amazon S3 using JDBC drivers
  • F. Amazon EMR with Apache Hive, using an Amazon RDS with MySQL-compatible backed metastore
Show Suggested Answer Hide Answer
Suggested Answer: ACE 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Prodip
Highly Voted 3 years, 9 months ago
I will go with A,C,E . Glue can do both pyspark and scala based ETL. Glue for Metadata and JDBC drivers to connect Athena from outside of AWS. Server less . so, Operational management is limited
upvoted 48 times
abhineet
3 years, 9 months ago
ya i thought so too, ACE for me
upvoted 4 times
...
...
jack42
Highly Voted 3 years, 8 months ago
Each word has meaning, So I will go with ABD, A metadata management that allows federation for access control, B- batch-based ETL using PySpark, D-JDBC connections from legacy clients. Not-C- because it mentioned only scala but questions mentioned scala operation is limited, E- you need JDBC to connect clinet not the Athena
upvoted 6 times
Mahesh22
3 years, 8 months ago
Correct. ABD is right
upvoted 1 times
...
vanireddy
3 years, 8 months ago
I agree with this. Correct is ABD.
upvoted 1 times
...
shammous
2 years, 7 months ago
EMR=Operationd overhead. ETL does it all and it is a managed service. ACE is better answer
upvoted 1 times
shammous
2 years, 7 months ago
I mean AWS Glue (not ETL) is a serverless service and you don't need to provision it.
upvoted 1 times
...
...
abgz887
2 years, 8 months ago
if we select B,D (EMR-spark,Hive-jdbc),does it not make more sense to use Emr-Hive-datastore(F),instead of glue-catalog(A),limiting operational management. - making BDF more appropriate.
upvoted 1 times
...
...
tsangckl
Most Recent 1 year, 3 months ago
Bing Option A is correct because AWS Glue Data Catalog provides a unified metadata repository across a variety of data sources and data formats, and it integrates with Amazon S3, Amazon RDS, Amazon Athena, Amazon Redshift, and others. Option B is correct because Amazon EMR with Apache Spark supports PySpark and Scala for batch-based ETL processing. Option E is correct because Amazon Athena supports SQL queries and can be integrated with JDBC drivers, allowing legacy clients to execute queries.
upvoted 1 times
...
NarenKA
1 year, 4 months ago
Selected Answer: ABE
I will go with A. AWS Glue Data Catalog, B.Amazon EMR with Apache Spark, E. Amazon Athena aligns well with the company's requirements for a data lake architecture, offering a balance of performance, cost-efficiency, and ease of management. While C is also a viable option for ETL processes, it's more aligned with serverless ETL jobs and might not be as flexible for Scala as Amazon EMR with Apache Spark. D and F could provide JDBC connectivity and metadata management but are more operationally intensive and less integrated with S3 tiered storage strategies compared to using Athena with the Glue Data Catalog.
upvoted 1 times
...
geekfrosty
1 year, 10 months ago
Why are we saying C ? C just says "Scala" ETL, even though Glue supports both pyspark and scala and AWS managed, the option specifically mentions "Scala based". Requirement is for both Scala and Pyspark that directly points to EMR. answer should be ABE.. about operational management, it says "limited", and EMR can qualify with it. using glue there is 'no' operational overhead.
upvoted 1 times
...
NikkyDicky
1 year, 11 months ago
Selected Answer: ACE
ACE it
upvoted 1 times
...
pk349
2 years, 2 months ago
ACE: I passed the test
upvoted 3 times
...
cloudlearnerhere
2 years, 8 months ago
Selected Answer: ACE
Correct answers are A, C & E Option A as Glue Data Catalog provides metadata management with the federation for access control. Option C as AWS Glue supports both serverless PySpark and Scala-based ETL with the least operational overhead. Option E as Athena can be used for querying S3 data. Athena can be connected using JDBC drivers from the external legacy clients. Options B, D & E is wrong as using EMR and RDS would increase the operational management and cost.
upvoted 5 times
...
Arka_01
2 years, 9 months ago
Selected Answer: ACE
As operation management should be less, so all EMR related options are invalid, as EMR needs management of underlying EC2 instances
upvoted 2 times
...
Abep
2 years, 10 months ago
Selected Answer: ACE
A. *Less* operational overhead compared to F (selected) B. High operational overhead, when compared to "C" AWS Glue based Scala C. *Less* operational overhead compared to "B" EMR PySpark (selected) D. Higher operational overhead when compared to "E" Athena. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/HiveJDBCDriver.html E. *Less* operational overhead when compared to "D" EMR Hive JDBC (selected) F. Higher operational overhead when compared to "A" Glue metadata
upvoted 2 times
...
rocky48
2 years, 11 months ago
Selected Answer: ACE
I will go with A,C,E
upvoted 2 times
...
GarfieldBin
3 years ago
Selected Answer: ABC
D and F are wrong because the question never mentions Hive. E is not right, since Athena don't need JDBC to query S3. C is right because AWS Glue can be used for Scala-based ETL. A is right because Glue can connect on-premises DB through JDBC. https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/. B is right because Apache Spark can support PySpark.
upvoted 1 times
...
Bik000
3 years, 1 month ago
Selected Answer: ACE
My Answer is A, C & E
upvoted 3 times
...
Japanese1
3 years, 4 months ago
D, F are clearly wrong. D : JDBC connection from older clients is required, NOT from Athena. F : It is redundant to use RDS as a metastore. And there is no requirement for MySQL-compatible backed metastore. I'm torn between B and C. B is predominant in terms of cost constraints, but I am doubtful.
upvoted 1 times
...
Donell
3 years, 8 months ago
Answer: A,C,E EMR has operational overhead.
upvoted 2 times
...
Donell
3 years, 8 months ago
Answer: A,C,E EMR has operational overhead.
upvoted 3 times
...
Shraddha
3 years, 8 months ago
Ans - ACE Note: This is a free score question. Anything EMR comparing to serverless Glue / Athena is operational overhead. Also remember Glue can do PySpark and Scala, and Athena can do JDBC.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...