A company that produces network devices has millions of users. Data is collected from the devices on an hourly basis and stored in an Amazon S3 data lake.
The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities.
The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day.
How should this data be stored for optimal performance?
zanhsieh
Highly Voted 3 years, 9 months agoabhineet
3 years, 9 months agoPaitan
Highly Voted 3 years, 9 months agokondi2309
Most Recent 1 year, 4 months agoMLCL
1 year, 11 months agoNikkyDicky
1 year, 11 months agopk349
2 years, 2 months agoanonymous909
2 years, 3 months agosrirnag
2 years, 4 months agocloudlearnerhere
2 years, 8 months agorocky48
2 years, 11 months agodushmantha
3 years agoAyaa4
3 years agoBik000
3 years, 1 month agojrheen
3 years, 2 months agorav009
3 years, 5 months agoDonell
3 years, 8 months agoShraddha
3 years, 8 months ago