exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 63 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 63
Topic #: 1
[All Certified Data Engineer Professional Questions]

A Databricks SQL dashboard has been configured to monitor the total number of records present in a collection of Delta Lake tables using the following query pattern:


SELECT COUNT (*) FROM table -

Which of the following describes how results are generated each time the dashboard is updated?

  • A. The total count of rows is calculated by scanning all data files
  • B. The total count of rows will be returned from cached results unless REFRESH is run
  • C. The total count of records is calculated from the Delta transaction logs
  • D. The total count of records is calculated from the parquet file metadata
  • E. The total count of records is calculated from the Hive metastore
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
aragorn_brego
Highly Voted 1 year, 6 months ago
Selected Answer: C
Delta Lake maintains a transaction log that records details about every change made to a table. When you execute a count operation on a Delta table, Delta Lake can use the information in the transaction log to calculate the total number of records without having to scan all the data files. This is because the transaction log includes information about the number of records in each file, allowing for an efficient aggregation of these counts to get the total number of records in the table.
upvoted 6 times
...
Syd
Highly Voted 1 year, 7 months ago
Answer C https://delta.io/blog/2023-04-19-faster-aggregations-metadata/#:~:text=You%20can%20get%20the%20number,a%20given%20Delta%20table%20version.
upvoted 5 times
...
c315d10
Most Recent 3 weeks, 2 days ago
Selected Answer: A
Metadata could be outdated
upvoted 1 times
...
KadELbied
1 month, 1 week ago
Selected Answer: C
Suretly C
upvoted 1 times
...
AlejandroU
6 months ago
Selected Answer: D
Answer D. Parquet Metadata Usage: Delta Lake does utilize Parquet file metadata for COUNT(*) operations. Parquet files store metadata, including row counts. Delta efficiently reads this metadata to get the total count without scanning the actual data within the files. This is a key optimization for performance. Why not always scan: Scanning all data files for every COUNT(*) would be extremely inefficient, especially for large tables. This defeats the purpose of using a columnar storage format like Parquet and the optimizations built into Delta Lake and Spark. The transaction log tracks changes to the table (adds, deletes, updates) but doesn't store pre-computed row counts. It's used for time travel, ACID properties, and other Delta features.
upvoted 2 times
arekm
5 months, 2 weeks ago
Definitely C - see link posted by Syd
upvoted 1 times
...
...
Sriramiyer92
6 months, 1 week ago
Selected Answer: C
"stats": "{\"numRecords\": 3, \"minValues\": {\"x\": 1}, \"maxValues\": {\"x\": 3}, \"nullCount\": {\"x\": 0}}", numRecords - In Delta tx logs will give you the value
upvoted 1 times
...
Ati1362
11 months, 3 weeks ago
Selected Answer: C
Delta transaction log
upvoted 2 times
...
sodere
1 year, 6 months ago
Selected Answer: C
Transaction log provides statistics about the delta table.
upvoted 4 times
...
alexvno
1 year, 6 months ago
Selected Answer: C
C - transaction logs contains info about files rows count
upvoted 3 times
...
Dileepvikram
1 year, 7 months ago
The answer is C
upvoted 2 times
...
PearApple
1 year, 7 months ago
Selected Answer: C
The answer should be C
upvoted 2 times
...
sturcu
1 year, 7 months ago
Selected Answer: C
total rows will be calculated from delta logs
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...