exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 20 discussion

A company is migrating on-premises workloads to AWS. The company wants to reduce overall operational overhead. The company also wants to explore serverless options.
The company's current workloads use Apache Pig, Apache Oozie, Apache Spark, Apache Hbase, and Apache Flink. The on-premises workloads process petabytes of data in seconds. The company must maintain similar or better performance after the migration to AWS.
Which extract, transform, and load (ETL) service will meet these requirements?

  • A. AWS Glue
  • B. Amazon EMR
  • C. AWS Lambda
  • D. Amazon Redshift
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
milofficial
Highly Voted 10 months, 2 weeks ago
Selected Answer: B
Glue is like the more good-looking one, but weaker brother of EMR. So when it's about petabyte scales, let EMR do the work and have Glue stay away from the action.
upvoted 16 times
...
heavenlypearl
Most Recent 1 month ago
Selected Answer: B
Amazon EMR Serverless is a deployment option for Amazon EMR that provides a serverless runtime environment. This simplifies the operation of analytics applications that use the latest open-source frameworks, such as Apache Spark and Apache Hive. With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks. https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html
upvoted 1 times
...
87ebc7d
1 month, 1 week ago
Discarded, not 'discarted'. 'Discarted' isn't a word.
upvoted 2 times
...
leotoras
2 months, 3 weeks ago
B. Amazon EMR Serverless is a deployment option for Amazon EMR that provides a serverless runtime environment. This simplifies the operation of analytics applications that use the latest open-source frameworks, such as Apache Spark and Apache Hive. With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks.
upvoted 1 times
...
Eleftheriia
3 months, 1 week ago
Selected Answer: A
I think it is A, Glue • Amazon EMR is used for petabyte-scale data collection and data processing. • AWS Glue is used as a serverless and managed ETL service, and also used for managing data quality with AWS Glue Data Quality.
upvoted 2 times
...
San_Juan
3 months, 1 week ago
Selected Answer: A
Glue. It talks about "serverless" so EMR is discarted. The mention of Spark, Hbase, etc is for confusing you, because it doesn't say that they wanted to keep using them. Glue can run Spark using "glueContext" (similar a SparkContext) for reading tables, files and create frames.
upvoted 1 times
...
sachin
3 months, 3 weeks ago
The company also wants to explore serverless options. ? Glue (A). or EMR Serverless
upvoted 1 times
...
V0811
4 months ago
Selected Answer: A
Serverless: AWS Glue is a fully managed, serverless ETL service that automates the process of data discovery, preparation, and transformation, helping minimize operational overhead.Integration with Big Data Tools: It integrates well with various AWS services and supports Spark jobs for ETL purposes, which aligns well with Apache Spark workloads.Performance: AWS Glue can handle large-scale ETL workloads, and it is designed to manage petabytes of data efficiently, comparable to the performance of on-premises solutions.While B. Amazon EMR could also be considered for its flexibility in handling big data workloads using tools like Apache Spark, it requires more management and doesn't fit the serverless requirement as closely as AWS Glue. Therefore, AWS Glue is the most suitable choice given the constraints and requirements.
upvoted 1 times
...
pypelyncar
6 months ago
Selected Answer: B
EMR provides a managed Hadoop framework that natively supports Apache Pig, Oozie, Spark, and Flink. This allows the company to migrate their existing workloads with minimal code changes, reducing development effort
upvoted 3 times
...
tgv
6 months, 1 week ago
Selected Answer: B
That's exactly the purpose of EMR. "Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto." https://aws.amazon.com/emr/
upvoted 2 times
...
Just_Ninja
7 months ago
Selected Answer: A
Glue is Serverless :)
upvoted 2 times
...
wa212
8 months ago
Selected Answer: B
https://docs.aws.amazon.com/ja_jp/emr/latest/ManagementGuide/emr-what-is-emr.html
upvoted 2 times
...
certplan
8 months, 2 weeks ago
- While AWS Glue is a fully managed ETL service and offers serverless capabilities, it might not provide the same level of performance and flexibility as Amazon EMR for handling petabyte-scale workloads with complex processing requirements. - AWS Glue is optimized for data integration, cataloging, and ETL jobs but may not be as well-suited for heavy-duty processing tasks that require frameworks like Apache Spark, Apache Flink, etc., which are commonly used for large-scale data processing. - Documentation on AWS Glue can be found in the AWS Glue Developer Guide https://docs.aws.amazon.com/glue/index.html.
upvoted 2 times
...
certplan
8 months, 2 weeks ago
A. AWS Glue: AWS Glue is a fully managed extract, transform, and load (ETL) service provided by Amazon Web Services (AWS). It allows users to prepare and load data for analytics purposes B. Amazon EMR: Amazon Elastic MapReduce (EMR) is a cloud-based big data platform provided by AWS. It allows users to process and analyze large amounts of data using popular frameworks such as Apache Hadoop, Apache Spark, Apache Hive, Apache HBase, and more. https://docs.aws.amazon.com/emr/index.html https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-best-practices.html https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage.html https://docs.aws.amazon.com/emr/latest/DeveloperGuide/emr-developer-guide.html As per the AWS/Amazon docs, option B specifically calls out it out with the specific features/options that the question asked directly about.
upvoted 2 times
...
GiorgioGss
9 months ago
Selected Answer: B
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html
upvoted 1 times
...
TonyStark0122
10 months, 1 week ago
A. AWS Glue
upvoted 1 times
...
[Removed]
10 months, 2 weeks ago
Selected Answer: B
https://aws.amazon.com/emr/features/
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago