A Data Scientist needs to migrate an existing on-premises ETL process to the cloud. The current process runs at regular time intervals and uses PySpark to combine and format multiple large data sources into a single consolidated output for downstream processing.
The Data Scientist has been given the following requirements to the cloud solution:
✑ Combine multiple data sources.
✑ Reuse existing PySpark logic.
✑ Run the solution on the existing schedule.
✑ Minimize the number of servers that will need to be managed.
Which architecture should the Data Scientist use to build this solution?
Paul_NoName
Highly Voted 3 years, 8 months ago[Removed]
3 years, 8 months agoSophieSu
Highly Voted 3 years, 8 months agoxicocaio
Most Recent 8 months, 3 weeks agoakgarg00
1 year, 7 months agosonoluminescence
1 year, 7 months agoShenannigan
1 year, 9 months agoMickey321
1 year, 9 months agokaike_reis
1 year, 10 months agoMaaayaaa
2 years, 2 months agobakarys
2 years, 3 months agosqavi
2 years, 4 months agoPeeking
2 years, 6 months agosalads
2 years, 10 months agoNickname_L
3 years, 7 months agogcpwhiz
3 years, 8 months agoAashi22
3 years, 8 months agoastonm13
3 years, 8 months ago