Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 133 discussion

Actual exam question from Google's Professional Data Engineer

Question #: 133
Topic #: 1

[All Professional Data Engineer Questions]

A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions. You have a REST API application with the requirement to serve predictions for an individual user ID with latency under 100 milliseconds. You use the following query to generate predictions: SELECT predicted_label, user_id FROM ML.PREDICT (MODEL 'dataset.model', table user_features). How should you create the ML pipeline?

A. Add a WHERE clause to the query, and grant the BigQuery Data Viewer role to the application service account.
B. Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.
C. Create a Dataflow pipeline using BigQueryIO to read results from the query. Grant the Dataflow Worker role to the application service account.
D. Create a Dataflow pipeline using BigQueryIO to read predictions for all users from the query. Write the results to Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Bigtable.

Show Suggested Answer

Suggested Answer: D 🗳️

by rickywck at March 20, 2020, 3:49 a.m.

Comments

Submit Cancel

rickywck

Highly Voted 4 years, 9 months ago

I think the key reason for pick D is the 100ms requirement.

upvoted 30 times

AzureDP900

2 years ago

D. Create a Dataflow pipeline using BigQueryIO to read predictions for all users from the query. Write the results to Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Bigtable.

upvoted 2 times

...

MaxNRG

Highly Voted 1 year ago

Selected Answer: D

The key requirements are serving predictions for individual user IDs with low (sub-100ms) latency. Option D meets this by batch predicting for all users in BigQuery ML, writing predictions to Bigtable for fast reads, and allowing the application access to query Bigtable directly for low latency reads. Since the application needs to serve low-latency predictions for individual user IDs, using Dataflow to batch predict for all users and write to Bigtable allows low-latency reads. Granting the Bigtable Reader role allows the application to retrieve predictions for a specific user ID from Bigtable.

upvoted 6 times

MaxNRG

1 year ago

The other options either require changing the query for each user ID (higher latency, option A), reading directly from higher latency services like BigQuery (option B), or writing predictions somewhere without fast single row access (options A, B, C). Option A would not work well because the WHERE clause would need to be changed for each user ID, increasing latency. Option B using an Authorized View would still read from BigQuery which has higher latency than Bigtable for individual rows. Option C writes predictions to BigQuery which has higher read latency compared to Bigtable for individual rows. So option D provides the best pipeline by predicting for all users in BigQueryML, batch writing to Bigtable for low latency reads, and granting permissions for the application to retrieve predictions. This meets the requirements of sub-100ms latency for individual user predictions. https://cloud.google.com/dataflow/docs/concepts/access-control

upvoted 2 times

...

Nirca

Most Recent 1 year, 3 months ago

Selected Answer: D

I think the key reason for pick D is the 100ms requirement/ me too

upvoted 1 times

...

barnac1es

1 year, 3 months ago

Selected Answer: B

To create an ML pipeline for serving predictions to individual user IDs with latency under 100 milliseconds using the given BigQuery ML query, the most suitable approach is: B. Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.

upvoted 2 times

...

Lanro

1 year, 5 months ago

Selected Answer: D

Always use Bigtable as an endpoint for client-facing applications (Low latency - high throughput)

upvoted 2 times

...

midgoo

1 year, 9 months ago

Selected Answer: D

One of the way to improve the efficient of ML pipeline is to generate cache (store predictions). In this question, only D is doing that.

upvoted 3 times

...

musumusu

1 year, 10 months ago

what is wrong with B ? View can be precomputed and cached and it can definitely satisfy the 100 miliseconds request. create a pipeline to send data to bigtable .. don't you think its too much to run a simple prediction query ?

upvoted 2 times