exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 10 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 10
Topic #: 1
[All Certified Data Engineer Professional Questions]

A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT
To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3
Which statement describes how the Delta engine identifies which files to load?

  • A. All records are cached to an operational database and then the filter is applied
  • B. The Parquet file footers are scanned for min and max statistics for the latitude column
  • C. All records are cached to attached storage and then the filter is applied
  • D. The Delta log is scanned for min and max statistics for the latitude column
  • E. The Hive metastore is scanned for min and max statistics for the latitude column
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
taif12340
Highly Voted 1 year, 8 months ago
Answer D: In the Transaction log, Delta Lake captures statistics for each data file of the table. These statistics indicate per file: - Total number of records - Minimum value in each column of the first 32 columns of the table - Maximum value in each column of the first 32 columns of the table - Null value counts for in each column of the first 32 columns of the table When a query with a selective filter is executed against the table, the query optimizer uses these statistics to generate the query result. it leverages them to identify data files that may contain records matching the conditional filter. For the SELECT query in the question, The transaction log is scanned for min and max statistics for the price column
upvoted 22 times
...
JoG1221
Most Recent 2 weeks ago
Selected Answer: D
elta extracts stats from the Parquet footers at write time and stores them in _delta_log. At query time, Delta reads the stats from the log instead of scanning file footers = faster performance.This is what enables efficient data skipping and query optimization.
upvoted 1 times
...
johnserafim
1 month, 4 weeks ago
Selected Answer: B
B is correct! Delta Lake stores min/max statistics for each column in the Parquet file footers. The engine scans these footers to determine if a file contains any data that satisfies the latitude > 66.3 condition. If the minimum latitude in a file is greater than 66.3, the file is loaded. If the maximum latitude is less than or equal to 66.3, the file is skipped.
upvoted 3 times
...
akashdesarda
7 months ago
Selected Answer: D
Above mentioned points are correct. If the table was just parquet table then parquet file footer have been used. But since this is Delta table, then delta log is used to scan & skip files. It uses stats written in in transaction log.
upvoted 3 times
...
AndreFR
8 months, 2 weeks ago
Answer D : Delta data skipping automatically collects the stats (min, max, etc.) for the first 32 columns for each underlying Parquet file when you write data into a Delta table. Databricks takes advantage of this information (minimum and maximum values) at query time to skip unnecessary files in order to speed up the queries. https://www.databricks.com/discover/pages/optimize-data-workloads-guide#delta-data
upvoted 2 times
...
saravanan289
8 months, 2 weeks ago
Selected Answer: D
Delta table stores file statistics in transaction log
upvoted 2 times
...
03355a2
10 months, 1 week ago
Selected Answer: D
No explanation needed, this is where the information is stored.
upvoted 2 times
...
imatheushenrique
11 months ago
D. The Delta log is scanned for min and max statistics for the latitude column
upvoted 1 times
...
coercion
11 months, 2 weeks ago
Selected Answer: D
Delta log collects statistics like min value, max value, no of records, no of files for each transaction that happens on the table for the first 32 columns (default value)
upvoted 1 times
...
Tayari
1 year ago
Selected Answer: D
D is the answer
upvoted 1 times
...
arik90
1 year, 1 month ago
Selected Answer: D
Based on Docu is D I don't know why here is showing B
upvoted 1 times
...
alexvno
1 year, 1 month ago
Selected Answer: D
Delta log first
upvoted 1 times
...
DavidRou
1 year, 1 month ago
Selected Answer: D
Statistics on first 32 columns of a table are computed and written in the Delta Log by default.
upvoted 1 times
...
vikram12apr
1 year, 2 months ago
Selected Answer: D
D is the right answer
upvoted 1 times
...
Curious76
1 year, 2 months ago
Selected Answer: D
D is the answer
upvoted 1 times
...
kkravets
1 year, 2 months ago
Selected Answer: D
D is correct one
upvoted 1 times
...
RiktRikt007
1 year, 2 months ago
I checked the delta log, and it dose store stat, stats":"{\"numRecords\":1,\"minValues\":{\"id\":1,\"name\":\"one\",\"age\":11},\"maxValues\":{\"id\":1,\"name\":\"one\",\"age\":11},\"nullCount\":{\"id\":0,\"name\":0,\"age\":0}}"
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago