Exam SnowPro Core topic 1 question 695 discussion

Actual exam question from Snowflake's SnowPro Core

Question #: 695
Topic #: 1

A JSON file that contains lots of dates and arrays needs to be processed in Snowflake. The user wants to ensure optimal performance while querying the data.

How can this be achieved?

A. Flatten the data and store it in structured data types in a flattened table. Query the table.
B. Store the data in a table with a VARIANT data type. Query the table.
C. Store the data in a table with a VARIANT data type and include STRIP_NULL_VALUES while loading the table. Query the table.
D. Store the data in an external stage and create views on top of it. Query the views.

Show Suggested Answer

Suggested Answer: A 🗳️

by MultiCloudIronMan at July 15, 2023, 7:41 p.m.

Comments

Submit Cancel

user_1011

3 months, 1 week ago

Selected Answer: A

From Snowflake documentation If you are not sure yet what types of operations you want to perform on your semi-structured data, Snowflake recommends storing the data in a VARIANT column for now. For data that is mostly regular and uses only data types that are native to the semi-structured format you are using (e.g. strings and integers for JSON format), the storage requirements and query performance for operations on relational data and data in a VARIANT column is very similar. For better pruning and less storage consumption, we recommend flattening your OBJECT and key data into separate relational columns if your semi-structured data includes: Dates and timestamps, especially non-ISO 8601 dates and timestamps, as string values Numbers within strings Arrays Non-native values (such as dates and timestamps in JSON) are stored as strings when loaded into a VARIANT column, so operations on these values could be slower and also consume more space than when stored in a relational column with the corresponding data type.

upvoted 1 times

...

icegrandpa

11 months, 2 weeks ago

why not C?

upvoted 2 times

...

0e504b5

1 year ago

Selected Answer: A

I'm undecided between A vs. B. In a real-world task, I would do B and do some ELT if needed to prep the data for analysis. Based on the docs below, it appears that A is recommended by Snowflake as more performant. https://docs.snowflake.com/en/user-guide/semistructured-considerations Storing Semi-structured Data in a VARIANT Column vs. Flattening the Nested Structure¶ For better pruning and less storage consumption, we recommend flattening your OBJECT and key data into separate relational columns if your semi-structured data includes: Dates and timestamps, especially non-ISO 8601 dates and timestamps, as string values Numbers within strings Arrays

upvoted 4 times

...

pvskbrod

1 year, 5 months ago

Selected Answer: B

I hesitate between A&B And will be happy to provoke a discussion If you know your use cases for the data, perform tests on a typical data set. Load the data set into a VARIANT column in a table. Use the FLATTEN function to extract the OBJECTs and keys you plan to query into a separate table. Run a typical set of queries against both tables to see which structure provides the best performance. https://docs.snowflake.com/en/user-guide/semistructured-considerations

upvoted 2 times

pvskbrod

1 year, 5 months ago

I have changed my opinion to B For better pruning and less storage consumption, we recommend flattening your OBJECT and key data into separate relational columns if your semi-structured data includes: Dates and timestamps, especially non-ISO 8601 dates and timestamps, as string values Numbers within strings Arrays

upvoted 2 times

Rajivnb

1 year, 4 months ago

You mean A? From your explanation, creating it in a relational table makes use of the Date/Time ranges for Micro-Partition clustering. helps Pruning.

upvoted 2 times

...

MultiCloudIronMan

1 year, 7 months ago

Selected Answer: A

correct

upvoted 3 times

...