exam questions

Exam Associate Data Practitioner All Questions

View all questions & answers for the Associate Data Practitioner exam

Exam Associate Data Practitioner topic 1 question 4 discussion

Actual exam question from Google's Associate Data Practitioner
Question #: 4
Topic #: 1
[All Associate Data Practitioner Questions]

You want to process and load a daily sales CSV file stored in Cloud Storage into BigQuery for downstream reporting. You need to quickly build a scalable data pipeline that transforms the data while providing insights into data quality issues. What should you do?

  • A. Create a batch pipeline in Cloud Data Fusion by using a Cloud Storage source and a BigQuery sink.
  • B. Load the CSV file as a table in BigQuery, and use scheduled queries to run SQL transformation scripts.
  • C. Load the CSV file as a table in BigQuery. Create a batch pipeline in Cloud Data Fusion by using a BigQuery source and sink.
  • D. Create a batch pipeline in Dataflow by using the Cloud Storage CSV file to BigQuery batch template.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
n2183712847
1 month, 4 weeks ago
Selected Answer: A
The best option is A. Cloud Data Fusion pipeline (Cloud Storage to BigQuery). Option A is best because Cloud Data Fusion is visual and fast for pipeline building, scalable, handles transformations visually, and provides data quality insights within the pipeline. Option B (BigQuery load + SQL) is incorrect because scheduled queries are less of a pipeline and offer fewer built-in data quality features. Option C (BigQuery load + Data Fusion BQ to BQ) is incorrect because it's inefficient and redundant to load to BigQuery before Data Fusion. Option D (Dataflow template) is incorrect because while scalable, Data Fusion is often quicker to build visually for simpler pipelines. Therefore, Option A, Cloud Data Fusion, is the best balance of speed, scalability, and data quality for this task.
upvoted 2 times
...
jatinbhatia2055
2 months, 1 week ago
Selected Answer: D
There should be more detail in the question. Though both Dataflow and Datafusion can be used. Datafusion is more suitable if you dont want to code and let google do the work. In case there is more complexity in daily analysis of the CSV, Dataflow is the best approach as it provide in built templates and custom template creation both.
upvoted 2 times
rich_maverick
2 months ago
Answer D is saying Dataflow and not Datafusion. We all agree that Datafusion is the "quick and scalable" option. Also, Dataflow does not give insights to data quality issues. Answer is A.
upvoted 2 times
...
...
trashbox
3 months, 1 week ago
Selected Answer: A
Cloud Data Fusion enables us to build a scalable data pipeline from Cloud Storage to BigQuery. In addition, the service provides us an end-to-end data lineage for root cause and impact analysis.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago