exam questions

Exam Associate Data Practitioner All Questions

View all questions & answers for the Associate Data Practitioner exam

Exam Associate Data Practitioner topic 1 question 46 discussion

Actual exam question from Google's Associate Data Practitioner
Question #: 46
Topic #: 1
[All Associate Data Practitioner Questions]

You work for an online retail company. Your company collects customer purchase data in CSV files and pushes them to Cloud Storage every 10 minutes. The data needs to be transformed and loaded into BigQuery for analysis. The transformation involves cleaning the data, removing duplicates, and enriching it with product information from a separate table in BigQuery. You need to implement a low-overhead solution that initiates data processing as soon as the files are loaded into Cloud Storage. What should you do?

  • A. Use Cloud Composer sensors to detect files loading in Cloud Storage. Create a Dataproc cluster, and use a Composer task to execute a job on the cluster to process and load the data into BigQuery.
  • B. Schedule a direct acyclic graph (DAG) in Cloud Composer to run hourly to batch load the data from Cloud Storage to BigQuery, and process the data in BigQuery using SQL.
  • C. Use Dataflow to implement a streaming pipeline using an OBJECT_FINALIZE notification from Pub/Sub to read the data from Cloud Storage, perform the transformations, and write the data to BigQuery.
  • D. Create a Cloud Data Fusion job to process and load the data from Cloud Storage into BigQuery. Create an OBJECT_FINALI ZE notification in Pub/Sub, and trigger a Cloud Run function to start the Cloud Data Fusion job as soon as new files are loaded.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
n2183712847
1 month, 3 weeks ago
Selected Answer: C
dataflow with pub/sub is lowest overhead for real-time
upvoted 1 times
...
n2183712847
2 months ago
Selected Answer: C
The best solution is C. Dataflow with Pub/Sub notifications. Dataflow offers low-overhead, real-time streaming, ideal for immediate processing of files as they land in Cloud Storage, and handles the required transformations efficiently before loading into BigQuery. Option A (Composer/Dataproc) is too heavyweight and costly for frequent, small batches. Option B (Composer Hourly Batch) is too slow, introducing delays and data staleness. Option D (Data Fusion/Cloud Run) is more complex and potentially higher overhead than a simpler Dataflow pipeline for this streaming use case.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago