hiexam
databricks · Certified-Data-Engineer-Professional · Q425 · multiple_choice · topic_1

A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT,…

A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3 Which statement describes how the Delta engine identifies which files to load?
  • A.All records are cached to an operational database and then the filter is applied
  • B.The Parquet file footers are scanned for min and max statistics for the latitude column
  • C.All records are cached to attached storage and then the filter is applied
  • D.The Delta log is scanned for min and max statistics for the latitude column
  • E.The Hive metastore is scanned for min and max statistics for the latitude column
Explanation
Answer D: In the Transaction log, Delta Lake captures statistics for each data file of the table. These statistics indicate per file: - Total number of records - Minimum value in each column of the first 32 columns of the table - Maximum value in each column of the first 32 columns of the table - Null value counts for in each column of the first 32 columns of the table When a query with a selective filter is executed against the table, the query optimizer uses these statistics to generate the query result. it leverages them to identify data files that may contain records matching the conditional filter. For the SELECT query in the question, The transaction log is scanned for min and max statistics for the price column

Reference: examtopics_top_comment

Practice with progress tracking

Sign in to track wrong answers, get spaced-repetition reminders, and run timed exam mode.