A smart home automation company must efficiently ingest and process messages from various connected devices and sensors. The majority of these messages are comprised of a large number of small files. These messages are ingested using Amazon Kinesis Data Streams and sent to Amazon S3 using a Kinesis data stream consumer application. The Amazon S3 message data is then passed through a processing pipeline built on Amazon EMR running scheduled PySpark jobs.
The data platform team manages data processing and is concerned about the efficiency and cost of downstream data processing. They want to continue to use
PySpark.
Which solution improves the efficiency of the data processing jobs and is well architected?
Phoenyx89
Highly Voted 3 years, 7 months agometin
3 years, 6 months agoDr_Kiko
3 years, 5 months agoIpc01
3 years, 3 months agoIpc01
3 years, 3 months agojuanife
1 year, 10 months agosingh100
Highly Voted 3 years, 7 months agoGauravM17
3 years, 7 months agoGeeBeeEl
3 years, 6 months agoAshish1101
2 years, 5 months agokzu19878
3 years, 7 months agoAjNapa
3 years, 6 months agoHaimett
2 years, 6 months agoNarenKA
Most Recent 1 year, 2 months agoGCPereira
1 year, 3 months agoGCPereira
1 year, 4 months agoMLCL
1 year, 9 months agoDebi_mishra
1 year, 11 months agopk349
2 years agoakashm99101001com
2 years, 1 month agoChelseajcole
2 years, 3 months agoDeerSong
2 years, 3 months agoKinlive1991
2 years, 4 months agosiju13
2 years, 4 months agohenom
2 years, 5 months agonadavw
2 years, 5 months agothuyeinaung
2 years, 5 months agoalinato
2 years, 5 months ago