Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Professional Data Engineer topic 1 question 54 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 54
Topic #: 1
[All Professional Data Engineer Questions]

Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user, and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first. What should you do?

  • A. Create a file on a shared file and have the application servers write all bid events to that file. Process the file with Apache Hadoop to identify which user bid first.
  • B. Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL.
  • C. Set up a MySQL database for each application server to write bid events into. Periodically query each of those distributed MySQL databases and update a master MySQL database with bid event information.
  • D. Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud Dataflow. Give the bid for each item to the user in the bid event that is processed first.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
jvg637
Highly Voted 4 years, 2 months ago
I'd go with B: real-time is requested, and the only scenario for real time (in the 4 presented) is the use of pub/sub with push.
upvoted 61 times
Tanzu
2 years, 3 months ago
B. - for realtime pub/sub push is critical. pull creates latency. (eliminates D) - process by event-time, not by process -time (eliminates D)
upvoted 4 times
godot
2 years, 1 month ago
no push avail: https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub#streaming-pull-migration
upvoted 1 times
...
jin0
1 year, 2 months ago
The dataflow is designed for realtime processing. and this case should be needed to use dataflow because there is no option to order the data if not using dataflow. So D is answer I think
upvoted 1 times
...
...
AzureDP900
1 year, 4 months ago
Agree with B
upvoted 1 times
...
[Removed]
3 years, 1 month ago
i would go with option B, Cause option D states "Give the bid for each item to the user in the bid event that is processed first" . The requirement is to get the first bid based on event time not processed first in dataflow.
upvoted 25 times
...
donbigi
1 year, 3 months ago
This approach is not ideal because it requires a custom endpoint to write the bid event information into Cloud SQL. This adds additional complexity and potential points of failure to the architecture, as well as adding latency to the processing of bid events, since the data must be written to both Pub/Sub and Cloud SQL. Additionally, it can be more challenging to ensure that bid events are processed in the order they were received, since the data is being written to multiple databases. Finally, using a single database to store bid events could limit scalability and availability, and can also result in slow query performance.
upvoted 3 times
...
...
Ganshank
Highly Voted 4 years, 1 month ago
D The need is to collate the messages in real-time. We need to de-dupe the messages based on timestamp of when the event occurred. This can be done by publishing ot Pub-Sub and consuming via Dataflow.
upvoted 34 times
Tanzu
2 years, 3 months ago
Yeap, that's why B is the right one. It has pub/sub push, more real time than pub/sub pull. You need to aware at some point , something has to be pulled which adds a latency.
upvoted 1 times
...
unnamed12355
1 year, 1 month ago
D isnt correct, Pub/sub can send messages out of order, it is no guaranty that the event with lowest timestamp will be processed first B is correct
upvoted 3 times
...
...
yassoraa88
Most Recent 1 week, 5 days ago
Selected Answer: D
This is the most suitable solution for the requirements. Google Cloud Pub/Sub can handle high throughput and low-latency data ingestion. Coupled with Google Cloud Dataflow, which can process data streams in real time, this setup allows for immediate processing of bid events. Dataflow can also handle ordering and timestamp extraction, crucial for determining which bid came first. This architecture supports scalability and real-time analytics, which are essential for a global auction system.
upvoted 1 times
...
teka112233
1 week, 6 days ago
Selected Answer: D
the Answer should be D for the following Real-time Processing Centralized Processing Winner Determination also, B is unsuitable as While Pub/Sub can ingest data, Cloud SQL is a relational database not designed for real-time processing at this scale. Maintaining a custom endpoint adds complexity.
upvoted 1 times
...
I__SHA1234567
2 months ago
Selected Answer: D
Google Cloud Pub/Sub is a scalable and reliable messaging service that can handle high volumes of data and deliver messages in real-time. By having each application server publish bid events to Cloud Pub/Sub, you ensure that all bid events are collected centrally. Using Cloud Dataflow with a pull subscription allows you to process the bid events in real-time. Cloud Dataflow provides a managed service for stream and batch processing, and it can handle the real-time processing requirements efficiently. By processing the bid events with Cloud Dataflow, you can determine which user bid first by applying the appropriate logic within your Dataflow pipeline. This approach ensures scalability, reliability, and real-time processing capabilities, making it suitable for handling bid events from multiple application servers.
upvoted 1 times
...
philli1011
3 months, 2 weeks ago
B should be the answer, because it writes the bid into Cloud SQL to a distributed system. This way the customer know if they get the bid or not, immediately. Also, push requests are faster than pull requests, hence they are better for realtime experience.
upvoted 1 times
...
arpana_naa
4 months, 3 weeks ago
Selected Answer: D
pub/sub for entry time stamp + event time dataflow for processing and dataflow is better for real time
upvoted 1 times
...
Nandababy
5 months, 1 week ago
To accurately determine who bid first in a globally distributed auction application, utilizing a push mechanism instead of a pull mechanism is generally considered the more reliable approach. B should be correct answer.
upvoted 1 times
...
Zepopo
5 months, 4 weeks ago
Selected Answer: B
key words is "single location in real time"
upvoted 2 times
...
rocky48
6 months, 1 week ago
Selected Answer: D
Answer : D We need to de-dupe the messages based on timestamp of when the event occurred. This can be done by publishing ot Pub-Sub and consuming via Dataflow. D sounds like a complete answer. B does not.
upvoted 2 times
...
Nivea007
7 months, 1 week ago
D. Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud. This approach leverages Google Cloud Pub/Sub for real-time data ingestion and Google Cloud Dataflow for real-time data processing, ensuring that bids are processed as they occur, which aligns with real-time requirements. It's not B because there is a step involving a custom endpoint that writes data into Cloud SQL. This additional step could introduce some latency, and it's important to ensure that the custom endpoint and Cloud SQL database can handle the real-time load effectively.
upvoted 1 times
patiwwb
7 months ago
But D treats the bids according to the processed time. We need to consider event time that's why B is the right answer.
upvoted 1 times
...
...
imran79
7 months, 2 weeks ago
D. Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud Dataflow. Give the bid for each item to the user in the bid event that is processed first.
upvoted 1 times
...
Nirca
7 months, 2 weeks ago
Selected Answer: B
B. Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL. is correct
upvoted 2 times
...
DeepakVenkatachalam
7 months, 4 weeks ago
Correct Answer is B. option D is based on processing first and not based on event first. so option D cannot be right answer
upvoted 1 times
...
np717
8 months, 3 weeks ago
Selected Answer: D
D is the best solution because it is both real-time and scalable. Google Cloud Dataflow can process the bid events in the order in which they occurred and give the bid for each item to the user in the bid event that is processed first.
upvoted 1 times
...
NeoNitin
9 months, 2 weeks ago
B.here is why Option B is like using special messaging balloons that quickly carry all the bids to a special spot. From there, a super-fast friend checks them and tells us who bid first. This way, we find out quickly! Option D is like having all the bids sent to a special magic box that quickly sends them to a smart computer friend. This friend looks at the bids right away and tells us who should get the toy based on who bid first. In conclusion, Option B (Cloud Pub/Sub and Cloud SQL) or Option D (Google Cloud Pub/Sub and Cloud Dataflow) are the most suitable choices for real-time processing of bids and determining the first bidder. They offer efficient, scalable, and real-time solutions for handling bid events in a globally distributed auction application. The final decision would depend on factors such as the specific requirements, infrastructure, and expertise of the development team.
upvoted 1 times
NeoNitin
9 months, 2 weeks ago
Option B (Cloud Pub/Sub and Cloud SQL): Advantage: Cloud Pub/Sub can handle real-time data streaming, making it a good choice for quickly receiving bids. Storing bid events in Cloud SQL ensures they are easily accessible and can be analyzed in real-time. Disadvantage: It might require some additional setup and configuration to connect Cloud Pub/Sub to a custom endpoint and Cloud SQL. Option D (Google Cloud Pub/Sub and Cloud Dataflow): Advantage: Cloud Pub/Sub can handle real-time data streaming, similar to Option B. Using Cloud Dataflow can process the bid events quickly and efficiently. Disadvantage: It might require some additional setup and configuration for Cloud Dataflow.
upvoted 1 times
...
...
knith66
10 months ago
B is correct, As push provides near-real time
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...