Exam Associate Data Practitioner topic 1 question 30 discussion

Actual exam question from Google's Associate Data Practitioner

Question #: 30
Topic #: 1

[All Associate Data Practitioner Questions]

You are predicting customer churn for a subscription-based service. You have a 50 PB historical customer dataset in BigQuery that includes demographics, subscription information, and engagement metrics. You want to build a churn prediction model with minimal overhead. You want to follow the Google-recommended approach. What should you do?

A. Export the data from BigQuery to a local machine. Use scikit-learn in a Jupyter notebook to build the churn prediction model.
B. Use Dataproc to create a Spark cluster. Use the Spark MLlib within the cluster to build the churn prediction model.
C. Create a Looker dashboard that is connected to BigQuery. Use LookML to predict churn.
D. Use the BigQuery Python client library in a Jupyter notebook to query and preprocess the data in BigQuery. Use the CREATE MODEL statement in BigQueryML to train the churn prediction model.

Show Suggested Answer

Suggested Answer: D 🗳️

by n2183712847 at Feb. 27, 2025, 5:46 p.m.

Comments

Submit Cancel

n2183712847

4 months, 3 weeks ago

Selected Answer: D

The best and Google-recommended solution for building a churn model on a 50 PB BigQuery dataset with minimal overhead is D. Use BigQuery Python client and BigQueryML. BigQueryML enables in-database model training, eliminating data movement and minimizing overhead. This aligns with Google's best practices for BigQuery data. Option A (Local scikit-learn) is impractical due to the dataset size. Option B (Dataproc/Spark) introduces unnecessary data movement and cluster management overhead. Option C (Looker) is for BI, not ML model development. Therefore, Option D is the optimal choice for efficiency, scalability, and adherence to Google's recommendations for BigQuery-based machine learning.

upvoted 1 times

...