exam questions

Exam Associate Data Practitioner All Questions

View all questions & answers for the Associate Data Practitioner exam

Exam Associate Data Practitioner topic 1 question 18 discussion

Actual exam question from Google's Associate Data Practitioner
Question #: 18
Topic #: 1
[All Associate Data Practitioner Questions]

You are working with a large dataset of customer reviews stored in Cloud Storage. The dataset contains several inconsistencies, such as missing values, incorrect data types, and duplicate entries. You need to clean the data to ensure that it is accurate and consistent before using it for analysis. What should you do?

  • A. Use the PythonOperator in Cloud Composer to clean the data and load it into BigQuery. Use SQL for analysis.
  • B. Use BigQuery to batch load the data into BigQuery. Use SQL for cleaning and analysis.
  • C. Use Storage Transfer Service to move the data to a different Cloud Storage bucket. Use event triggers to invoke Cloud Run functions to load the data into BigQuery. Use SQL for analysis.
  • D. Use Cloud Run functions to clean the data and load it into BigQuery. Use SQL for analysis.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
n2183712847
2 months ago
Selected Answer: B
The best option is B. Use BigQuery to batch load the data into BigQuery and use SQL for cleaning and analysis. Loading directly into BigQuery and using SQL provides the optimal balance of efficiency and simplicity for cleaning large datasets before analysis by leveraging BigQuery's scalable processing for both loading and transformation. Option A (Cloud Composer + PythonOperator) adds unnecessary complexity of workflow orchestration and external processing before loading, reducing efficiency. Option C (Storage Transfer Service + Cloud Run) overcomplicates the process with extra data movement and event-driven functions, making it less direct for data cleaning. Option D (Cloud Run functions) is less efficient for large-scale data cleaning compared to BigQuery SQL's parallel processing and adds complexity before data is in BigQuery for analysis. Therefore, loading into BigQuery and using SQL is the most efficient and straightforward approach for cleaning data before analysis in this scenario.
upvoted 1 times
...
SaquibHerman
2 months, 2 weeks ago
Selected Answer: A
PythonOperator allows leveraging Python libraries (e.g., Pandas, PySpark) to perform robust data cleaning tasks: Handle missing values (e.g., imputation, filtering). Fix incorrect data types (e.g., string-to-date conversions). Remove duplicates (e.g., using deduplication logic).
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago