Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 67 discussion

Actual exam question from Google's Professional Data Engineer

Question #: 67
Topic #: 1

[All Professional Data Engineer Questions]

You are developing an application that uses a recommendation engine on Google Cloud. Your solution should display new videos to customers based on past views. Your solution needs to generate labels for the entities in videos that the customer has viewed. Your design must be able to provide very fast filtering suggestions based on data from other customer preferences on several TB of data. What should you do?

A. Build and train a complex classification model with Spark MLlib to generate labels and filter the results. Deploy the models using Cloud Dataproc. Call the model from your application.
B. Build and train a classification model with Spark MLlib to generate labels. Build and train a second classification model with Spark MLlib to filter results to match customer preferences. Deploy the models using Cloud Dataproc. Call the models from your application.
C. Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud Bigtable, and filter the predicted labels to match the user's viewing history to generate preferences.
D. Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud SQL, and join and filter the predicted labels to match the user's viewing history to generate preferences.

Show Suggested Answer

Suggested Answer: C 🗳️

by [deleted] at March 21, 2020, 4:25 p.m.

Comments

Submit Cancel

[Removed]

Highly Voted 4 years, 2 months ago

Answer: C A & B - Need to build your own model, so discarded as options C D can do the job here using Cloud Video Intelligence API. BigTable is better option. So C is correct

upvoted 36 times

jin0

1 year, 3 months ago

I don't understand why Vision API should be a answer for labeling? there is no information about input data. isn't it?

upvoted 1 times

...

jin0

1 year, 3 months ago

Is there any notice that has to reject own model in question..?

upvoted 1 times

...

[Removed]

Highly Voted 4 years, 2 months ago

Answer: C Description: Why to build own model, Video API with Bigtable is best solution

upvoted 14 times

...

Mathew106

Most Recent 11 months ago

Selected Answer: C

I don't even know if MLLib has out-of-the-box Computer Vision models. Developing this in Dataproc would be a nightmare. Using the computer vision API on the other hand makes perfect sense. The fact that the filtering must happen very fast and that this is a customer facing application points to BigTable so that there is very little latency and high availability. BigTable is eventually consistent but that doesn't really matter for this application. CloudSQL will ensure strong consistency which we don't really need but it is slower and supports max 64 TB. The description mentions multiple TBs. Not really sure what several means here, but Cloud SQL doesn't have a high cap.

upvoted 2 times

...

euro202

11 months, 2 weeks ago

Selected Answer: C

We need a model that extracts labels from videos, so Vision API could be used. Then we need a DB very fast and that can handle several TB of data, so BigTable is the best choice. Answer is C.

upvoted 1 times

...

samdhimal

1 year, 4 months ago

Option C is the correct choice because it utilizes the Cloud Video Intelligence API to generate labels for the entities in the videos, which would save time and resources compared to building and training a model from scratch. Additionally, by storing the data in Cloud Bigtable, it allows for fast and efficient filtering of the predicted labels based on the user's viewing history and preferences. This is a more efficient and cost-effective approach than storing the data in Cloud SQL and performing joins and filters.

upvoted 2 times

...

AzureDP900

1 year, 5 months ago

Answer is C Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud Bigtable, and filter the predicted labels to match the user's viewing history to generate preferences. 1. Rather than building a new model - it is better to use Google provide APIs, here - Google Video Intelligence. So option A and B rules out 2. Between SQL and Bigtable - Bigtable is the better option as Bigtable support row-key filtering. Joining the filters is not required. Reference: https://cloud.google.com/video-intelligence/docs/feature-label-detection

upvoted 1 times

...

MaxNRG

2 years, 5 months ago

Selected Answer: C

C. The cloud video intelillence api does the label generation without the need of building any model, A and B are excluded. Now, the bbdd most suitable for this is bigtable and not SQL (this big joins would be anything but fast). https://cloud.google.com/video-intelligence/docs/feature-label-detection

upvoted 2 times

...

sumanshu

2 years, 11 months ago

Vote for C

upvoted 4 times

...

timolo

3 years, 2 months ago

Answer: C Reference https://cloud.google.com/video-intelligence/docs/feature-label-detection

upvoted 2 times

...

daghayeghi

3 years, 3 months ago

answer C: If we presume that use label of video as a rowkey, Bigtable will be the best option. because it can store several TB, but Cloud SQL is limited to 30TB.

upvoted 7 times

...

NamitSehgal

3 years, 6 months ago

Answer: C

upvoted 3 times

...

Alasmindas

3 years, 7 months ago

Option C is the correct answer. 1. Rather than building a new model - it is better to use Google provide APIs, here - Google Video Intelligence. So option A and B rules out 2) Between SQL and Bigtable - Bigtable is the better option as Bigtable support row-key filtering. Joining the filters is not required.

upvoted 7 times

...

SureshKotla

3 years, 8 months ago

Answer is D : BigTable doesnt support JOIN and not built for transactions - https://cloud.google.com/bigtable/docs/overview

upvoted 2 times

Surjit24

3 years, 7 months ago

There are no joins but filtering based on condition.

upvoted 4 times

karthik89

3 years, 4 months ago

but the requirement involves join as well, it is stated in the problem.

upvoted 2 times

sumanshu

2 years, 11 months ago

Where? Though it's mention - " very fast filtering suggestions" - which means something like dictionary in python OR Key: Value (which is Bigtable)

upvoted 1 times

sraakesh95

2 years, 5 months ago

I think "based on other customer preferences" from the questions requires a join before a filter is applied for collaborative filtering

upvoted 1 times

Deepakd

2 years, 2 months ago

Recommendation based on other customer”s views cannot be achieved through simple joins. A class pf machine learning algorithms called collaborative filtering is required for that. You need big table to run these algorithms.

upvoted 1 times

...

haroldbenites

3 years, 10 months ago

Correct C

upvoted 2 times

...

dg63

3 years, 11 months ago

I doubt if C can be an answer. Will Bigtable allow filtering on labels?

upvoted 2 times

tprashanth

3 years, 11 months ago

Yes, if its part of the rowkey

upvoted 3 times

...

Rajuuu

3 years, 11 months ago

Answer is C.

upvoted 4 times

...

Ganshank

4 years, 2 months ago

C. The recommendation requires filtering based on several TB of data, therefore BigTable is the recommended option vs Cloud SQL which is limited to 10TB.

upvoted 7 times

...