exam questions

Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 70 discussion

Actual exam question from Databricks's Certified Data Engineer Associate
Question #: 70
Topic #: 1
[All Certified Data Engineer Associate Questions]

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:



If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

  • A. processingTime(1)
  • B. trigger(availableNow=True)
  • C. trigger(parallelBatch=True)
  • D. trigger(processingTime="once")
  • E. trigger(continuous="once")
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
benni_ale
1 year ago
Selected Answer: B
b is ok
upvoted 1 times
...
fifirifi
1 year, 1 month ago
Selected Answer: B
correct answer: B explanation: In Structured Streaming, if a data engineer wants to process all the available data in as many batches as required without any explicit trigger interval, they can use the option trigger(availableNow=True). This feature, availableNow, is used to specify that the query should process all the data that is available at the moment and not wait for more data to arrive.
upvoted 4 times
...
AndreFR
1 year, 4 months ago
Selected Answer: B
it’s the only answer with a correct syntax
upvoted 1 times
...
55f31c8
1 year, 5 months ago
Selected Answer: B
https://spark.apache.org/docs/latest/api/python/reference/pyspark.ss/api/pyspark.sql.streaming.DataStreamWriter.trigger.html
upvoted 2 times
...
kbaba101
1 year, 6 months ago
B availableNowbool, optional if set to True, set a trigger that processes all available data in multiple batches then terminates the query. Only one trigger can be set.
upvoted 4 times
...
meow_akk
1 year, 6 months ago
sorry Ans is B : https://stackoverflow.com/questions/71061809/trigger-availablenow-for-delta-source-streaming-queries-in-pyspark-databricks for batch we use available now
upvoted 4 times
...
meow_akk
1 year, 6 months ago
Correct Ans is D : %python spark.readStream.format("delta").load("<delta_table_path>") .writeStream .format("delta") .trigger(processingTime='5 seconds') #Added line of code that defines .trigger processing time. .outputMode("append") .option("checkpointLocation","<checkpoint_path>") .options(**writeConfig) .start() https://kb.databricks.com/streaming/optimize-streaming-transactions-with-trigger
upvoted 1 times
Souvik_79
9 months, 1 week ago
Nopes ! Use trigger(availableNow=True)
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago