exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 13 discussion

A company is planning to do a proof of concept for a machine learning (ML) project using Amazon SageMaker with a subset of existing on-premises data hosted in the company's 3 TB data warehouse. For part of the project, AWS Direct Connect is established and tested. To prepare the data for ML, data analysts are performing data curation. The data analysts want to perform multiple step, including mapping, dropping null fields, resolving choice, and splitting fields. The company needs the fastest solution to curate the data for this project.
Which solution meets these requirements?

  • A. Ingest data into Amazon S3 using AWS DataSync and use Apache Spark scrips to curate the data in an Amazon EMR cluster. Store the curated data in Amazon S3 for ML processing.
  • B. Create custom ETL jobs on-premises to curate the data. Use AWS DMS to ingest data into Amazon S3 for ML processing.
  • C. Ingest data into Amazon S3 using AWS DMS. Use AWS Glue to perform data curation and store the data in Amazon S3 for ML processing.
  • D. Take a full backup of the data store and ship the backup files using AWS Snowball. Upload Snowball data into Amazon S3 and schedule data curation jobs using AWS Batch to prepare the data for ML.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
abhineet
Highly Voted 3 years, 10 months ago
C is correct, s3 is a valid target for DMS https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html
upvoted 28 times
GauravM17
3 years, 10 months ago
I guess it should be A. DMS can can not do the data preprocessing and Spark is the best option on the large datasets
upvoted 2 times
Brijeshkrishna
3 years, 9 months ago
C is correct as AWS Glue uses Spark engine
upvoted 2 times
...
...
...
zeronine
Highly Voted 3 years, 11 months ago
C. DMS supports S3 as a target.
upvoted 6 times
...
Frazy
Most Recent 1 year, 9 months ago
C: Option A, using AWS DataSync and Apache Spark scripts, involves maintaining an on-premises EMR cluster, which adds complexity and management overhead. Option B, creating custom ETL jobs on-premises, requires significant development effort and may not be as efficient as using AWS Glue. Option D, using AWS Snowball for data transfer and AWS Batch for data curation, is less efficient and more time-consuming compared to the direct ingestion and curation approach.
upvoted 1 times
...
jerkane
1 year, 9 months ago
Selected Answer: C
C is correct using glue would be faster than using EMR
upvoted 1 times
...
monkeydba
1 year, 9 months ago
This is the differentiator. DMS can read a database source. DataSync cannot. The question says "hosted in the company's 3 TB data warehouse.". DataSync can read NFS, SMB, HDFS, S3. https://docs.aws.amazon.com/datasync/latest/userguide/how-datasync-transfer-works.html#onprem-aws
upvoted 2 times
...
monkeydba
1 year, 9 months ago
DataSync can indeed pull a subset of data. https://docs.aws.amazon.com/datasync/latest/userguide/filtering.html
upvoted 1 times
...
monkeydba
1 year, 9 months ago
The question mentions "subset" of data. Can DataSync do that? DMS can.
upvoted 1 times
...
gofavad926
1 year, 10 months ago
Selected Answer: A
A. I don't understand that all people agree on C. DMS means database migration service and here they mention data warehouse and not database, so this is not a DMS compatible source: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.html. A is the valid option because with DataSync you can migrate your DATA to the S3 and then we can process it with EMR (more efficient than Glue)
upvoted 1 times
...
debasishg
1 year, 10 months ago
Selected Answer: C
C. Because, 1. Datasync is used for file migration, DMS for Data. 2. GLUE ETL required to transform data after migration.
upvoted 1 times
...
NikkyDicky
2 years ago
Selected Answer: C
C for sure
upvoted 1 times
...
pk349
2 years, 3 months ago
C: I passed the test
upvoted 1 times
...
cloudlearnerhere
2 years, 9 months ago
Correct answer is C as DMS can be used for data migration to S3. AWS Glue can be used for preprocessing and data curation. Option A is wrong as DataSync is usually for storage migration and using Spark might be as operationally efficient as Glue. Option B is wrong as using on-premises custom ETL jobs might not be time-efficient. Option D is wrong as the data migration using Snowball will take time.
upvoted 4 times
...
Arka_01
2 years, 11 months ago
Selected Answer: C
Glue is the answer, as all the mentioned data operations are readily available with Glue.
upvoted 1 times
...
rocky48
3 years ago
Selected Answer: C
C is correct
upvoted 1 times
...
Thiya
3 years, 8 months ago
C is the correct answer, use DMS to ingest the data from on-prem data warehouse to S3 and use Glue DataBrew to data curation.
upvoted 1 times
...
Donell
3 years, 9 months ago
Answer C. Ingest data into Amazon S3 using AWS DMS. Use AWS Glue to perform data curation and store the data in Amazon 3 for ML processing.
upvoted 1 times
...
Shraddha
3 years, 9 months ago
A = wrong, DataSync is for storage migration not data warehouse. B = wrong, ETL job on-premise is not fast. D = wrong, too slow.
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...