Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 70 discussion

Actual exam question from Databricks's Certified Data Engineer Associate

Question #: 70
Topic #: 1

[All Certified Data Engineer Associate Questions]

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

A. processingTime(1)
B. trigger(availableNow=True)
C. trigger(parallelBatch=True)
D. trigger(processingTime="once")
E. trigger(continuous="once")

Show Suggested Answer

Suggested Answer: B 🗳️

by meow_akk at Oct. 22, 2023, 5:16 a.m.

Comments

Submit Cancel

benni_ale

1 year, 2 months ago

Selected Answer: B

b is ok

upvoted 1 times

...

fifirifi

1 year, 3 months ago

Selected Answer: B

correct answer: B explanation: In Structured Streaming, if a data engineer wants to process all the available data in as many batches as required without any explicit trigger interval, they can use the option trigger(availableNow=True). This feature, availableNow, is used to specify that the query should process all the data that is available at the moment and not wait for more data to arrive.

upvoted 4 times

...

AndreFR

1 year, 6 months ago

Selected Answer: B

it’s the only answer with a correct syntax

upvoted 1 times

...

55f31c8

1 year, 7 months ago

Selected Answer: B

https://spark.apache.org/docs/latest/api/python/reference/pyspark.ss/api/pyspark.sql.streaming.DataStreamWriter.trigger.html

upvoted 2 times

...

kbaba101

1 year, 8 months ago

B availableNowbool, optional if set to True, set a trigger that processes all available data in multiple batches then terminates the query. Only one trigger can be set.

upvoted 4 times

...

meow_akk

1 year, 8 months ago

sorry Ans is B : https://stackoverflow.com/questions/71061809/trigger-availablenow-for-delta-source-streaming-queries-in-pyspark-databricks for batch we use available now

upvoted 4 times

...

meow_akk

1 year, 8 months ago

Correct Ans is D : %python spark.readStream.format("delta").load("<delta_table_path>") .writeStream .format("delta") .trigger(processingTime='5 seconds') #Added line of code that defines .trigger processing time. .outputMode("append") .option("checkpointLocation","<checkpoint_path>") .options(**writeConfig) .start() https://kb.databricks.com/streaming/optimize-streaming-transactions-with-trigger

upvoted 1 times

Souvik_79

11 months, 1 week ago

Nopes ! Use trigger(availableNow=True)

upvoted 1 times

...