A company hosts more than 300 global websites and applications. The company requires a platform to analyze more than 30 TB of clickstream data each day. What should a solutions architect do to transmit and process the clickstream data?
A.
Design an AWS Data Pipeline to archive the data to an Amazon S3 bucket and run an Amazon EMR cluster with the data to generate analytics.
B.
Create an Auto Scaling group of Amazon EC2 instances to process the data and send it to an Amazon S3 data lake for Amazon Redshift to use for analysis.
C.
Cache the data to Amazon CloudFront. Store the data in an Amazon S3 bucket. When an object is added to the S3 bucket, run an AWS Lambda function to process the data for analysis.
D.
Collect the data from Amazon Kinesis Data Streams. Use Amazon Kinesis Data Firehose to transmit the data to an Amazon S3 data lake. Load the data in Amazon Redshift for analysis.
This is just a real life use case which amazon has converted to a question. Check link below:-
https://aws.amazon.com/solutions/case-studies/hearst-data-analytics/
Answer will be D.
Key takeaways from the case study:-
Built a clickstream analytics platform that transmits and processes more than 30 terabytes of clickstream data a day, streamed from more than 300 Hearst websites worldwide.
Amazon Kinesis Firehose automatically moves buffered data from Amazon Kinesis Data Streams into persistent storage on Amazon Simple Storage Service (Amazon S3). This replaces an Amazon Elastic Compute Cloud (Amazon EC2) instance the team previously had to manage.
The transformed clickstream data is pulled from a Hearst data lake and sent to Amazon Redshift for analytical queries and complex data science work.
From Amazon Redshift, the data gets pushed to end users through an API to the company’s content management system.
AWS pipeline does all the administrative tasks of scheduling, execution and retry logic, track dependencies for all steps and does not execute the task until all dependencies are met. Its not real time.
Should be D.
D-kinesis makes it easy to collect,process and analyze 'streaming data' in 'real-time'.
KDF has KDS as one if its inputs(optios being Clients, SDK, KPL,Kinesis Agent, KDS, Amazon Cloudwatch and AWS IoT).
Further, the most important AWS destinations for KDF are S3, Redshift("copy via S3") and ElasticSearch.
Redshift is used for data warehousing/analytics.
(NOTE-KDS is 'realtime' and KDF is 'near-realtime'
A-not very sure (EMR is used to deploy Big Data/Hadoop clusters for analytics).
B-EC2/ASG may be a choice if aim is to develop the website,
but here target is to handle transmission and processing of stream
data.
C-Cloudfront/caching will not help ,we are not doing content delivery.
We are ingesting content from points across globe.
D is very tempting except for one requirement: the company hosts 300 global web sites. Amazon Kinesis Data Streams is a regional service, so D is not a complete answer. C meets all requirements. See https://aws.amazon.com/kinesis/data-streams/ and read Benefits/Durable
Head here :
https://aws.amazon.com/en/datapipeline/details/
There is this exactly example on the bottom of the page. Answer involves Data Pipeline and EMR because we are talking about TB per day
This use case, the code pipeline takes the data from S3 to redshift, not too S3 from click-stream. So I'm going with D, clickstream Kinesis data-streams.
Both Data Pipeline and EMR process data (ETL) and move it. EMR requires other tools for analysis. https://docs.aws.amazon.com/whitepapers/latest/big-data-analytics-options/amazon-emr.html
this will help you determine that D is correct: https://aws.amazon.com/blogs/big-data/running-amazon-payments-analytics-on-amazon-redshift-with-750tb-of-data/
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
waqas
Highly Voted 3 years, 8 months agoawsnoobster
3 years, 2 months agonoahsark
3 years, 8 months agoAnkush_sh
3 years, 6 months agoericsrz
Highly Voted 3 years, 8 months agomiles3719
2 years, 9 months agoaxelrodb
Most Recent 1 year, 9 months agolbertolini
2 years, 10 months agoziiziii
2 years, 10 months agomiles3719
2 years, 9 months agoTheWallPTA
3 years ago25dec_
3 years, 5 months agoprex
3 years, 5 months agoprex
3 years, 5 months agoFF11
3 years, 6 months agojnxtx
3 years, 6 months agogargaditya
3 years, 6 months agoSikku1981
3 years, 7 months agoCotter
3 years, 7 months agovvsandipvv
3 years, 7 months agoJayBro2
3 years, 7 months agoAlways_Wanting_Stuff
3 years, 7 months agocraycomm
3 years, 7 months agomanan728
3 years, 6 months agoMircuz
3 years, 8 months agoDahMac
3 years, 7 months agosoosowon6
2 years, 2 months agosoosowon6
2 years, 2 months agoNapoleonBorntoparty
3 years, 8 months ago