Snowflake - Schema On Read Implementation

Question

We are looking at implementing Schema on Read to load data onto snowflake tables. We receive .csv files in an AWS S3 path which will be the source for our tables. But the structure of these feed files change often and we don't want to manually alter the already created table, every time the schema of a file is changed. We want to automate the entire process of loading the file onto the Snowflake table based on how schema changes for a file and we don't want to create a new table every time a new attribute is added/ removed to the file. It would be really helpful if someone can help us with their suggestions on how better this can be implemented.

Lukasz Szozda · Accepted Answer · 2023-06-10 07:10:16Z

2

There is a feature that allows to infer the schema and create a table:

CREATE TABLE … USING TEMPLATE:

CREATE TABLE … USING TEMPLATE

Creates a new table with the column definitions derived from a set of staged files containing semi-structured data. This feature is currently limited to Apache Parquet, Apache Avro, and ORC files.

...

This example builds on an example in the INFER_SCHEMA topic:

CREATE TABLE mytable
  USING TEMPLATE (
    SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
      FROM TABLE(
        INFER_SCHEMA(
          LOCATION=>'@mystage',
          FILE_FORMAT=>'my_parquet_format'
        )
      ));

INFER_SCHEMA:

Automatically detects the file metadata schema in a set of staged data files that contain semi-structured data and retrieves the column definitions. Use the column definitions to simplify the creation of a landing table or external table to query the data.

This feature is currently limited to Apache Parquet, Apache Avro, and ORC files The support for JSON and CSV files is currently in preview.

edited Jun 10, 2023 at 7:10

answered Aug 12, 2021 at 15:09

Lukasz Szozda

181k26 gold badges278 silver badges326 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

CMe Over a year ago

I'm afraid he needs this for structured data (.cvs files). Not for semi-structured data.

Lukasz Szozda Over a year ago

@CMe I wrote this answer with question's topic "schema-on read" from in mind. This functionality is in preview and still evolving so csv may be added as well. But you are right as of moment of writing it does not support CSV

CMe · Accepted Answer · 2021-08-12 13:15:57Z

0

One approch could be the following :

Load the data into a staging table. This staging table could be created with blanck headers (col1, col2 ... colX) or on-the-go using CREATE TABLE AS SELECT $1, $2 ... $X from @stage.
Create a dynamic query wich will get headers from the staging table then generate an insert/merge query to load data into your final table. This can be done using a stored procedure.

answered Aug 12, 2021 at 13:15

CMe

6924 silver badges9 bronze badges

Comments

FKayani · Accepted Answer · 2021-08-12 14:56:34Z

0

Snowflake supports using standard SQL to query data files located in an internal (i.e. Snowflake) stage or named external (Amazon S3, Google Cloud Storage, or Microsoft Azure) stage. This can be useful for inspecting/viewing the contents of the staged files, particularly before loading or after unloading data.

In addition, by referencing [metadata columns][1] in a staged file, a staged data query can return additional information, such as filename and row numbers, about the file.

Snowflake utilizes support for staged data queries to enable [transforming data during loading][2].

More Details: https://docs.snowflake.com/en/user-guide/querying-stage.html [1]: https://docs.snowflake.com/en/user-guide/querying-metadata.html [2]: https://docs.snowflake.com/en/user-guide/data-load-transform.html

answered Aug 12, 2021 at 14:56

FKayani

9061 gold badge7 silver badges10 bronze badges

1 Comment

FKayani Over a year ago

stackoverflow.com/questions/67499209/…

Collectives™ on Stack Overflow

Snowflake - Schema On Read Implementation

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related