An ML engineer needs to merge and transform data from two sources to retrain an existing ML model. One data source consists of .csv files that are stored in an Amazon S3 bucket. Each .csv file consists of millions of records. The other data source is an Amazon Aurora DB cluster.
The result of the merge process must be written to a second S3 bucket. The ML engineer needs to perform this merge-and-transform task every week.
Which solution will meet these requirements with the LEAST operational overhead?
AgboolaKun
2 weeks, 3 days ago