Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Professional Data Engineer topic 1 question 129 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 129
Topic #: 1
[All Professional Data Engineer Questions]

You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?

  • A. Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.
  • B. Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.
  • C. Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.
  • D. Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
[Removed]
Highly Voted 4 years, 1 month ago
Should be B
upvoted 22 times
...
Ganshank
Highly Voted 4 years ago
B The questions is specifically about organizing the data in BigQuery and storing backups.
upvoted 12 times
...
zevexWM
Most Recent 1 week, 5 days ago
Selected Answer: D
Answer is D: Snapshots are different from time travel. They can hold data as long as we want. Furthermore "BigQuery only stores bytes that are different between a snapshot and its base table" so pretty cost effective as well. https://cloud.google.com/bigquery/docs/table-snapshots-intro#table_snapshots
upvoted 1 times
...
Farah_007
3 weeks, 4 days ago
Selected Answer: B
From : https://cloud.google.com/architecture/dr-scenarios-for-data#BigQuery It can't be D If the corruption is caught within 7 days, query the table to a point in time in the past to recover the table prior to the corruption using snapshot decorators. Store the original data on Cloud Storage. This allows you to create a new table and reload the uncorrupted data. From there, you can adjust your applications to point to the new table. => D
upvoted 1 times
...
Nirca
6 months, 2 weeks ago
Selected Answer: D
D - this solution in integrated. No core is needed
upvoted 4 times
...
Bahubali1988
7 months, 1 week ago
90% of questions are having multiple answers and its very hard to get into every discussion where the conclusion is not there
upvoted 5 times
...
ckanaar
7 months, 2 weeks ago
Selected Answer: B
The answer is B: Why not D? Because snapshot costs can become high if a lot of small changes are made to the base table: https://cloud.google.com/bigquery/docs/table-snapshots-intro#:~:text=Because%20BigQuery%20storage%20is%20column%2Dbased%2C%20small%20changes%20to%20the%20data%20in%20a%20base%20table%20can%20result%20in%20large%20increases%20in%20storage%20cost%20for%20its%20table%20snapshot. Since the question specifically states that the ETL pipeline is regularly modified, this means that lots of small changes are present. In combination with the requirement to optimize for storage costs, this means that option B is the way to go.
upvoted 5 times
...
arien_chen
8 months, 2 weeks ago
Selected Answer: D
keyword: detected after 2 weeks. only snapshot could resolve the problem.
upvoted 1 times
...
Lanro
9 months, 1 week ago
Selected Answer: D
From BigQuery documentation - Benefits of using table snapshots include the following: - Keep a record for longer than seven days. With BigQuery time travel, you can only access a table's data from seven days ago or more recently. With table snapshots, you can preserve a table's data from a specified point in time for as long as you want. - Minimize storage cost. BigQuery only stores bytes that are different between a snapshot and its base table, so a table snapshot typically uses less storage than a full copy of the table. So storing data in GCS will make copies of data for each table. Table snapshots are more optimal in this scenario.
upvoted 5 times
...
vamgcp
9 months, 2 weeks ago
Selected Answer: B
Organizing your data in separate tables for each month will make it easier to identify the affected data and restore it. Exporting and compressing the data will reduce storage costs, as you will only need to store the compressed data in Cloud Storage. Storing your backups in Cloud Storage will make it easier to restore the data, as you can restore the data from Cloud Storage directly
upvoted 1 times
...
phidelics
11 months ago
Selected Answer: B
Organize in separate tables and store in GCS
upvoted 1 times
cetanx
10 months, 4 weeks ago
Just an additional info! Here is an example for an export job; $ bq extract --destination_format CSV --compression GZIP 'your_project:your_dataset.your_new_table' 'gs://your_bucket/your_object.csv.gz'
upvoted 1 times
cetanx
10 months ago
I will update my answer to D. Think of a scenario that you are in the last week of June and an error occurred 3 weeks ago (so still in June) however you do not have an export of the June table yet therefore you cannot recover the data simply because you don't have an export just yet. So snapshots are way to go!
upvoted 2 times
...
...
...
sdi_studiers
11 months ago
Selected Answer: D
D "With BigQuery time travel, you can only access a table's data from seven days ago or more recently. With table snapshots, you can preserve a table's data from a specified point in time for as long as you want." [source: https://cloud.google.com/bigquery/docs/table-snapshots-intro]
upvoted 2 times
...
WillemHendr
11 months ago
"Store your data in different tables for specific time periods. This method ensures that you need to restore only a subset of data to a new table, rather than a whole dataset." "Store the original data on Cloud Storage. This allows you to create a new table and reload the uncorrupted data. From there, you can adjust your applications to point to the new table." B
upvoted 2 times
...
lucaluca1982
1 year, 1 month ago
Why not D?
upvoted 3 times
...
zellck
1 year, 5 months ago
Selected Answer: B
B is the answer.
upvoted 1 times
...
John_Pongthorn
1 year, 7 months ago
Selected Answer: B
B https://cloud.google.com/architecture/dr-scenarios-for-data#BigQuery
upvoted 2 times
...
MaxNRG
2 years, 3 months ago
Selected Answer: B
B seems the best solution (but C is also good candidate) D is incorrect - table decorators allow time travel back only up to 7 days (see https://cloud.google.com/bigquery/table-decorators) - if you want to keep older snapshots, you would have to save them into separate table yourself (and pay for storage).
upvoted 7 times
MaxNRG
2 years, 3 months ago
BigQuery. If you want to archive data, you can take advantage of BigQuery's long term storage. If a table is not edited for 90 consecutive days, the price of storage for that table automatically drops by 50 percent. There is no degradation of performance, durability, availability, or any other functionality when a table is considered long term storage. If the table is edited, though, it reverts back to the regular storage pricing and the 90 day countdown starts again.
upvoted 3 times
MaxNRG
2 years, 3 months ago
BigQuery is replicated, but this won't help with corruption in your tables. Therefore, you need to have a plan to be able to recover from that scenario. For example, you can do the following: • If the corruption is caught within 7 days, query the table to a point in time in the past to recover the table prior to the corruption using snapshot decorators. • Export the data from BigQuery, and create a new table that contains the exported data but excludes the corrupted data. • Store your data in different tables for specific time periods. This method ensures that you will need to restore only a subset of data to a new table, rather than a whole dataset. • Store the original data on Cloud Storage. This allows you to create a new table and reload the uncorrupted data. From there, you can adjust your applications to point to the new table. https://cloud.google.com/solutions/dr-scenarios-for-data#BigQuery
upvoted 3 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...