Exam Associate Data Practitioner topic 1 question 21 discussion

Actual exam question from Google's Associate Data Practitioner

Question #: 21
Topic #: 1

[All Associate Data Practitioner Questions]

You need to create a weekly aggregated sales report based on a large volume of data. You want to use Python to design an efficient process for generating this report. What should you do?

A. Create a Cloud Run function that uses NumPy. Use Cloud Scheduler to schedule the function to run once a week.
B. Create a Colab Enterprise notebook and use the bigframes.pandas library. Schedule the notebook to execute once a week.
C. Create a Cloud Data Fusion and Wrangler flow. Schedule the flow to run once a week.
D. Create a Dataflow directed acyclic graph (DAG) coded in Python. Use Cloud Scheduler to schedule the code to run once a week.

Show Suggested Answer

Suggested Answer: D 🗳️

by n2183712847 at Feb. 27, 2025, 6:22 p.m.

Comments

Submit Cancel

Rio55

3 months, 1 week ago

Selected Answer: D

Option D (Dataflow DAG coded in Python) and Option B using (bigframes.pandas in Colab Enterprise) seem to be the most suitable. Option D, Dataflow is designed for parallel processing of large datasets, making it efficient. Python is the chosen language, and Cloud Scheduler handles the weekly scheduling. Option B, using bigframes.pandas in Colab Enterprise is also a possibility for leveraging Python and BigQuery, but Dataflow is generally more robust and scalable for production ETL pipelines involving large datasets. Option A with Cloud Run might face limitations with large data and execution time. Option C doesn't directly use Python code for the process design. I would choose D.

upvoted 2 times

...

n2183712847

4 months ago

Selected Answer: B

Option B (Colab Enterprise + bigframes.pandas) and Option D (Dataflow DAG in Python) are the most efficient options for handling large data volumes and using Python. Option B is likely more straightforward and faster to implement for generating a report, especially if the analyst is familiar with Pandas and notebooks. bigframes.pandas simplifies interaction with BigQuery data within a Python environment for reporting purposes. Option D is more robust and scalable for general data pipelines and potentially more complex transformations, but might be overkill for a weekly reporting task compared to the ease of use offered by bigframes.pandas in Colab Enterprise. Given the need for an "efficient process for generating this report" and the desire to use Python, Option B provides a very efficient and relatively simpler path for creating a weekly aggregated sales report directly from BigQuery using Python's familiar Pandas-like syntax via bigframes.pandas in a scheduled Colab Enterprise notebook. Final Answer: The final answer is B B

upvoted 3 times

...