Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 94 discussion

Actual exam question from Google's Professional Data Engineer

Question #: 94
Topic #: 1

[All Professional Data Engineer Questions]

You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery. The reference data is small enough to fit in memory on a single worker. The pipeline should write enriched results to BigQuery for analysis. Which job type and transforms should this pipeline use?

A. Batch job, PubSubIO, side-inputs
B. Streaming job, PubSubIO, JdbcIO, side-outputs
C. Streaming job, PubSubIO, BigQueryIO, side-inputs
D. Streaming job, PubSubIO, BigQueryIO, side-outputs

Show Suggested Answer

Suggested Answer: C 🗳️

by rickywck at March 17, 2020, 9:34 a.m.

Comments

Submit Cancel

rickywck

Highly Voted 4 years, 9 months ago

Why not C? Without BigQueryIO how can data be written back to BigQuery?

upvoted 31 times

xq

4 years, 9 months ago

C should be right

upvoted 8 times

...

[Removed]

Highly Voted 4 years, 8 months ago

Answer: C Description: Sideinput for Bigquery data

upvoted 16 times

...

JOKKUNO

Most Recent 11 months, 4 weeks ago

Side inputs In addition to the main input PCollection, you can provide additional inputs to a ParDo transform in the form of side inputs. A side input is an additional input that your DoFn can access each time it processes an element in the input PCollection. When you specify a side input, you create a view of some other data that can be read from within the ParDo transform’s DoFn while processing each element. Side inputs are useful if your ParDo needs to inject additional data when processing each element in the input PCollection, but the additional data needs to be determined at runtime (and not hard-coded). Such values might be determined by the input data, or depend on a different branch of your pipeline.

upvoted 2 times

JOKKUNO

11 months, 4 weeks ago

https://beam.apache.org/documentation/programming-guide/#side-inputs

upvoted 2 times

...

piyush7777

1 year, 4 months ago

Why not side-output?

upvoted 1 times

...

TQM__9MD

1 year, 4 months ago

Selected Answer: B

B. Use multi-cluster routing to add a second cluster to the existing instance, utilizing a live traffic app profile for the regular workload and a batch analytics profile for the analytical workload.

upvoted 1 times

...

Mathew106

1 year, 4 months ago

Selected Answer: C

The answer is C. It's a trap so that you answer A because of batch vs streaming but you need BigQueryIO. On the other hand, streaming is absolutely redundant here and will incur extra costs. C is right but would be better with batch.

upvoted 2 times

...

Siadd

1 year, 11 months ago

A is the Answer. A. Batch job, PubSubIO, side-inputs

upvoted 1 times

...

zellck

2 years ago

Selected Answer: C

C is the answer. https://cloud.google.com/dataflow/docs/tutorials/ecommerce-java#side-input-pattern In streaming analytics applications, data is often enriched with additional information that might be useful for further analysis. For example, if you have the store ID for a transaction, you might want to add information about the store location. This additional information is often added by taking an element and bringing in information from a lookup table.

upvoted 4 times

...

sedado77

2 years, 3 months ago

Selected Answer: C

I got this question on sept 2022. Answer is C

upvoted 3 times

chrismayola

2 years, 1 month ago

dear can you please help, i have some questions about how to prepare the cerification exam using this questionnaire. this is my email [email protected], ping me to have some conversation

upvoted 1 times

...

alex12441

2 years, 10 months ago

Selected Answer: C

Answer: C

upvoted 1 times

...

medeis_jar

2 years, 11 months ago

Selected Answer: C

I vote for C, because data will come from Pub/Sub, so it should be streaming, we'll need PubSubIO to be able to read from PubSub and BigQueryIO to be able to write to BigQuery, finally the side-inputs pattern let us enrich data

upvoted 5 times

...

MaxNRG

2 years, 11 months ago

Selected Answer: C

Static reference data from BigQuery will go as side-inputs and data from pub-sub will go as streaming data using PubSubIO and finally BigQueryIO is required to push the final data to BigQuery

upvoted 4 times

...

JG123

3 years ago

Ans: C

upvoted 1 times

...

pals_muthu

3 years, 3 months ago

Answer is C, You need pubsubIO and BigQueryIO for streaming data and writing enriched data back to BigQuery. side-inputs are a way to enrich the data https://cloud.google.com/architecture/e-commerce/patterns/slow-updating-side-inputs

upvoted 6 times

...

Meuter

3 years, 4 months ago

I choose C, because data will come from Pub/Sub, so it should be streaming, we'll need PubSubIO to be able to read from PubSub y BigQueryIO to be able to write to BigQuery, finally the side-inputs pattern let us enrich data https://beam.apache.org/releases/javadoc/2.4.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html https://cloud.google.com/architecture/e-commerce/patterns/slow-updating-side-inputs https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html

upvoted 3 times

...

daghayeghi

3 years, 9 months ago

C: we have to use Streaming job because of Pub/Sub, and side-input thanks to static reference data. and we have to leverage BigQueryIO since finally we want to write data to BigQuery. then C is the correct answer.

upvoted 2 times

...

someshsehgal

3 years, 10 months ago

Correct A. batch is cost-effective and no need to go for streaming

upvoted 1 times

funtoosh

3 years, 10 months ago

How you are going to write back to BQ?

upvoted 1 times

...

Load full discussion...