Exam Associate Data Practitioner topic 1 question 25 discussion

Actual exam question from Google's Associate Data Practitioner

Question #: 25
Topic #: 1

[All Associate Data Practitioner Questions]

Your team is building several data pipelines that contain a collection of complex tasks and dependencies that you want to execute on a schedule, in a specific order. The tasks and dependencies consist of files in Cloud Storage, Apache Spark jobs, and data in BigQuery. You need to design a system that can schedule and automate these data processing tasks using a fully managed approach. What should you do?

A. Use Cloud Scheduler to schedule the jobs to run.
B. Use Cloud Tasks to schedule and run the jobs asynchronously.
C. Create directed acyclic graphs (DAGs) in Cloud Composer. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.
D. Create directed acyclic graphs (DAGs) in Apache Airflow deployed on Google Kubernetes Engine. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.

Show Suggested Answer

Suggested Answer: C 🗳️

by n2183712847 at Feb. 27, 2025, 6:07 p.m.

Comments

Submit Cancel

n2183712847

5 months, 1 week ago

Selected Answer: C

The best fully managed solution for scheduling and automating complex data pipelines is C. Use Cloud Composer with DAGs and appropriate operators. Cloud Composer, being a fully managed Apache Airflow service, is specifically designed for orchestrating complex workflows with dependencies and offers built-in operators to connect to Cloud Storage, Spark (via Dataproc), and BigQuery. Option D (Airflow on GKE) is not fully managed and adds operational overhead. Options A (Cloud Scheduler) and B (Cloud Tasks) are not designed for complex workflow orchestration and dependency management. Therefore, Option C is the optimal choice for a fully managed, robust, and feature-rich solution for data pipeline orchestration.

upvoted 1 times

...