Skip to content
Basic Questions
- What is the COPY command in Snowflake? How is it used?
- Explain its purpose in loading data from external files or stages into Snowflake tables.
- What are the different types of data formats supported by Snowflake for loading?
- Discuss formats like CSV, JSON, Parquet, Avro, ORC, XML.
- What is a Snowflake stage, and what are the types of stages available?
- Describe user stages, table stages, and internal/external named stages.
- How do you handle file compression when loading data into Snowflake?
- Explain supported compression types like GZIP, BZIP2, and how they’re automatically detected.
- How do you load data from Amazon S3, Azure Blob Storage, or Google Cloud Storage into Snowflake?
- Discuss creating external stages and configuring access credentials.
Intermediate Questions
- What are the common errors during data loading in Snowflake, and how do you debug them?
- Examples include file parsing errors, data type mismatches, and handling them using
VALIDATION_MODE
and error tables.
- What is the difference between
COPY INTO
and PUT
commands?
- Highlight the purpose of
PUT
for uploading files to stages and COPY INTO
for loading them into tables.
- Explain how Snowflake handles semi-structured data during loading.
- Talk about JSON, Avro, and how the
VARIANT
data type simplifies storage.
- What is the role of
ON_ERROR
in the COPY command? What are the possible values?
- Options include
CONTINUE
, ABORT_STATEMENT
, and SKIP_FILE
.
- What is file metadata, and how does Snowflake use it during data loading?
- Discuss how Snowflake uses file names and modification times to prevent duplicate loads.
Advanced Questions
- How can you optimize data loading performance in Snowflake?
- Discuss splitting large files into smaller ones, using parallel loading, and setting appropriate file sizes.
- How does Snowflake ensure data integrity and consistency during loading?
- Mention atomicity in loading and the use of staging for error isolation.
- What are the key considerations for loading large datasets into Snowflake?
- Talk about clustering, partitioning, and table design for optimized performance.
- Explain how to perform incremental data loading in Snowflake.
- Describe methods using timestamps, change data capture (CDC), or MERGE statements.
- What is the significance of the
FORCE
parameter in the COPY command? When would you use it?
- Discuss scenarios like reloading the same file.
Scenario-Based Questions
- How would you load a 5TB dataset from an external stage into Snowflake without impacting production performance?
- Discuss staged loading, parallel processing, and resource scaling.
- What approach would you take to load a corrupted JSON file into Snowflake?
- Mention using
VALIDATION_MODE
and isolating problematic rows.
- How do you load real-time streaming data into Snowflake?
- Discuss Snowpipe, REST API, and Kafka integrations.
- You need to load multiple file types (e.g., CSV, JSON) into the same table. How would you handle this?
- Explain using file format objects and transformations during loading.
- What is Snowpipe, and how does it differ from the COPY command?
- Highlight the continuous data ingestion features of Snowpipe.
Practical Hands-On Questions
- Write a Snowflake query to load a CSV file into a table using the COPY command.
- Demonstrate how to validate the content of a file in a Snowflake stage before loading.
- How would you use Snowflake streams and tasks to manage incremental data loads?
- Show how to load data into a Snowflake table with a defined schema from a Parquet file.
- Describe the process of unloading data from a Snowflake table to an external stage.