exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 141 discussion

A retail company wants to combine its customer orders with the product description data from its product catalog. The structure and format of the records in each dataset is different. A data analyst tried to use a spreadsheet to combine the datasets, but the effort resulted in duplicate records and records that were not properly combined. The company needs a solution that it can use to combine similar records from the two datasets and remove any duplicates.
Which solution will meet these requirements?

  • A. Use an AWS Lambda function to process the data. Use two arrays to compare equal strings in the fields from the two datasets and remove any duplicates.
  • B. Create AWS Glue crawlers for reading and populating the AWS Glue Data Catalog. Call the AWS Glue SearchTables API operation to perform a fuzzy- matching search on the two datasets, and cleanse the data accordingly.
  • C. Create AWS Glue crawlers for reading and populating the AWS Glue Data Catalog. Use the FindMatches transform to cleanse the data.
  • D. Create an AWS Lake Formation custom transform. Run a transformation for matching products from the Lake Formation console to cleanse the data automatically.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
spaceexplorer
Highly Voted 2 years ago
Selected Answer: C
C; Glue can use FindMatches transformation to find duplicates
upvoted 20 times
KlaudYu
1 year, 11 months ago
It says "Each dataset contains records with a unique structure and format.", so C would not be correct.
upvoted 3 times
f4bi4n
1 year, 11 months ago
but thats exactly the use of FindMatches: The FindMatches transform enables you to identify duplicate or matching records in your dataset, even when the records do not have a common unique identifier and no fields match exactly
upvoted 4 times
...
...
...
uninit
Highly Voted 1 year, 3 months ago
Selected Answer: C
It is C as described in the tutorial - https://docs.aws.amazon.com/glue/latest/dg/machine-learning-transform-tutorial.html LakeFormation can also invoke a FindMatches algorithm (because it manages Data Ingestion through Glue), but we don't have a data lake in this example. No one would build a whole Data Lake - a process that takes days - only to find some matching records.
upvoted 6 times
...
Mickey321
Most Recent 8 months, 3 weeks ago
Selected Answer: C
Option C
upvoted 1 times
...
adisabeba
1 year, 5 months ago
Selected Answer: D
Lake Formation helps clean and prepare your data for analysis by providing a Machine Learning (ML) Transform called FindMatches for deduplication and finding matching records. For example, use FindMatches to find duplicate records in your database of restaurants, such as when one record lists “Joe's Pizza” at “121 Main St.” and another shows “Joseph's Pizzeria” at “121 Main.” You don't need to know anything about ML to do this. FindMatches will simply ask you to label sets of records as either “matching” or “not matching.” The system will then learn your criteria for calling a pair of records a match and will build an ML Transform that you can use to find duplicate records within a database or matching records across two databases. https://aws.amazon.com/lake-formation/features/
upvoted 1 times
...
ogm1
1 year, 11 months ago
AWS Lake Formation FindMatches is a new machine learning (ML) transform that enables you to match records across different datasets as well as identify and remove duplicate records, with little to no human intervention Ans is D
upvoted 2 times
ovokpus
1 year, 10 months ago
Thing is, FindMatches is not a custom transformation in LakeFormation. And LakeFormation transforms are actually Glue jobs
upvoted 1 times
...
...
[Removed]
1 year, 11 months ago
D is correct
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago