exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 133 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 133
Topic #: 1
[All Professional Data Engineer Questions]

A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions. You have a REST API application with the requirement to serve predictions for an individual user ID with latency under 100 milliseconds. You use the following query to generate predictions: SELECT predicted_label, user_id FROM ML.PREDICT (MODEL 'dataset.model', table user_features). How should you create the ML pipeline?

  • A. Add a WHERE clause to the query, and grant the BigQuery Data Viewer role to the application service account.
  • B. Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.
  • C. Create a Dataflow pipeline using BigQueryIO to read results from the query. Grant the Dataflow Worker role to the application service account.
  • D. Create a Dataflow pipeline using BigQueryIO to read predictions for all users from the query. Write the results to Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Bigtable.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rickywck
Highly Voted 4 years, 7 months ago
I think the key reason for pick D is the 100ms requirement.
upvoted 30 times
AzureDP900
1 year, 10 months ago
D. Create a Dataflow pipeline using BigQueryIO to read predictions for all users from the query. Write the results to Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Bigtable.
upvoted 2 times
...
...
[Removed]
Highly Voted 4 years, 7 months ago
Answer: D Description: Bigtable provides lowest latency
upvoted 14 times
...
MaxNRG
Most Recent 10 months, 3 weeks ago
Selected Answer: D
The key requirements are serving predictions for individual user IDs with low (sub-100ms) latency. Option D meets this by batch predicting for all users in BigQuery ML, writing predictions to Bigtable for fast reads, and allowing the application access to query Bigtable directly for low latency reads. Since the application needs to serve low-latency predictions for individual user IDs, using Dataflow to batch predict for all users and write to Bigtable allows low-latency reads. Granting the Bigtable Reader role allows the application to retrieve predictions for a specific user ID from Bigtable.
upvoted 6 times
MaxNRG
10 months, 3 weeks ago
The other options either require changing the query for each user ID (higher latency, option A), reading directly from higher latency services like BigQuery (option B), or writing predictions somewhere without fast single row access (options A, B, C). Option A would not work well because the WHERE clause would need to be changed for each user ID, increasing latency. Option B using an Authorized View would still read from BigQuery which has higher latency than Bigtable for individual rows. Option C writes predictions to BigQuery which has higher read latency compared to Bigtable for individual rows. So option D provides the best pipeline by predicting for all users in BigQueryML, batch writing to Bigtable for low latency reads, and granting permissions for the application to retrieve predictions. This meets the requirements of sub-100ms latency for individual user predictions. https://cloud.google.com/dataflow/docs/concepts/access-control
upvoted 2 times
...
...
Nirca
1 year ago
Selected Answer: D
I think the key reason for pick D is the 100ms requirement/ me too
upvoted 1 times
...
barnac1es
1 year, 1 month ago
Selected Answer: B
To create an ML pipeline for serving predictions to individual user IDs with latency under 100 milliseconds using the given BigQuery ML query, the most suitable approach is: B. Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.
upvoted 2 times
...
Lanro
1 year, 3 months ago
Selected Answer: D
Always use Bigtable as an endpoint for client-facing applications (Low latency - high throughput)
upvoted 2 times
...
midgoo
1 year, 7 months ago
Selected Answer: D
One of the way to improve the efficient of ML pipeline is to generate cache (store predictions). In this question, only D is doing that.
upvoted 3 times
...
musumusu
1 year, 8 months ago
what is wrong with B ? View can be precomputed and cached and it can definitely satisfy the 100 miliseconds request. create a pipeline to send data to bigtable .. don't you think its too much to run a simple prediction query ?
upvoted 2 times
cheos71
1 year, 2 months ago
I think if there is too many concurrent requests the 100ms latency will def not hold reading from bigquery
upvoted 1 times
...
vaga1
1 year, 4 months ago
and the view has to make a query in real-time which adds potential latency
upvoted 1 times
...
vaga1
1 year, 4 months ago
I would say that Bigtable is simply more suited to serve applications
upvoted 1 times
...
...
hiromi
1 year, 11 months ago
Selected Answer: D
Vote for D
upvoted 1 times
...
HarshKothari21
2 years, 1 month ago
Selected Answer: D
Option D
upvoted 1 times
...
sumanshu
3 years, 4 months ago
Vote for D, requirement to serve predictions with in 100 ms
upvoted 6 times
...
Tanmoyk
4 years, 1 month ago
D is correct , 100ms is most critical factor here.
upvoted 6 times
...
sh2020
4 years, 2 months ago
writing it to BigTable and then allowing application access will introduce more delays. I think answer should be C
upvoted 1 times
f839
3 years, 9 months ago
Predictions are computed in advance for all users and written to BigTable for low-latency serving.
upvoted 3 times
...
...
haroldbenites
4 years, 2 months ago
D is correct
upvoted 4 times
...
Rajokkiyam
4 years, 7 months ago
Answer D.
upvoted 6 times
...
[Removed]
4 years, 7 months ago
Should be D
upvoted 7 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago