A Data Scientist needs to migrate an existing on-premises ETL process to the cloud. The current process runs at regular time intervals and uses PySpark to combine and format multiple large data sources into a single consolidated output for downstream processing.
The Data Scientist has been given the following requirements to the cloud solution:
✑ Combine multiple data sources.
✑ Reuse existing PySpark logic.
✑ Run the solution on the existing schedule.
✑ Minimize the number of servers that will need to be managed.
Which architecture should the Data Scientist use to build this solution?
Paul_NoName
Highly Voted 3 years, 10 months ago[Removed]
3 years, 10 months agoSophieSu
Highly Voted 3 years, 10 months agoxicocaio
Most Recent 10 months, 3 weeks agoakgarg00
1 year, 9 months agosonoluminescence
1 year, 9 months agoShenannigan
1 year, 11 months agoMickey321
1 year, 11 months agokaike_reis
2 years agoMaaayaaa
2 years, 4 months agobakarys
2 years, 5 months agosqavi
2 years, 6 months agoPeeking
2 years, 8 months agosalads
3 years agoNickname_L
3 years, 9 months agogcpwhiz
3 years, 10 months agoAashi22
3 years, 10 months agoastonm13
3 years, 10 months ago