A global pharmaceutical company receives test results for new drugs from various testing facilities worldwide. The results are sent in millions of 1 KB-sized JSON objects to an Amazon S3 bucket owned by the company. The data engineering team needs to process those files, convert them into Apache Parquet format, and load them into Amazon Redshift for data analysts to perform dashboard reporting. The engineering team uses AWS Glue to process the objects, AWS Step
Functions for process orchestration, and Amazon CloudWatch for job scheduling.
More testing facilities were recently added, and the time to process files is increasing.
What will MOST efficiently decrease the data processing time?
srinivasa
Highly Voted 3 years, 9 months agolakediver
3 years, 6 months agorajeevramadurai
Most Recent 1 year, 3 months agopk349
2 years, 1 month agocloudlearnerhere
2 years, 8 months agorocky48
2 years, 11 months agojealbave
2 years, 11 months agojrheen
3 years, 1 month agoTeraxs
3 years, 1 month agosimo40010
3 years, 3 months agocynthiacy
3 years, 6 months agonpt
3 years, 6 months agoaws2019
3 years, 7 months agolakeswimmer
3 years, 6 months agocnmc
3 years, 4 months agoOlga2022
3 years, 7 months agoali98
3 years, 7 months ago