5

I have a CSV file, a sample of it looks like this: Image of CSV file

Snowpipe is failing to load this CSV file with the following error:

Number of columns in file (5) does not match that of the corresponding table (3), use file format option error_on_column_count_mismatch=false to ignore this error

Can someone advise me csv file format definition to accomodate load without fail ?

1
  • Was your question resolved by any of the answers? please mark one as the solution if it was. Commented Jan 13, 2023 at 11:08

3 Answers 3

4

The issue is that the data you are trying to load contains commas (,) inside the data itself. Snowflake thinks that those commas represent new columns which is why it thinks there are 5 columns in your file. It is then trying to load these 5 columns into a table with only 3 columns resulting in an error.

You need to tell Snowflake that anything inside double-quotes (") should be loaded as-is, and not to interpret commas inside quotes as column delimiters.

When you create your file-format via the web interface there is an option which allows you to tell Snowflake to do this. Set the "Field optionally enclosed by" dropdown to "Double Quote" like in this picture:

enter image description here

Alternatively, if you're creating your file-format with SQL then there is an option called FIELD_OPTIONALLY_ENCLOSED_BY that you can set to \042 which does the same thing:

CREATE FILE FORMAT "SIMON_DB"."PUBLIC".sample_file_format 
    TYPE = 'CSV' 
    COMPRESSION = 'AUTO' 
    FIELD_DELIMITER = ',' 
    RECORD_DELIMITER = '\n' 
    SKIP_HEADER = 0 
    FIELD_OPTIONALLY_ENCLOSED_BY = '\042' # <---------------- Set to double-quote
    TRIM_SPACE = FALSE 
    ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE 
    ESCAPE = 'NONE' 
    ESCAPE_UNENCLOSED_FIELD = '\134' 
    DATE_FORMAT = 'AUTO' 
    TIMESTAMP_FORMAT = 'AUTO';
Sign up to request clarification or add additional context in comments.

2 Comments

What do they mean by compression AUTO?
It means that Snowflake will automatically detect the type of compression and try to decompress the file for you
0

If possible share the file format and one sample record to figure out the issue. Seems issue with number of column, Can you include field_optionally_enclosed_by option into your copy statement and try it once.

Comments

0

When the TAB character is unlikely to occur, I tend to use TAB delimited files - which, also - together with a Header -, make the source files more human-readable in case they need to be open for troubleshooting loading failures:

FIELD_DELIMITER = '\t'

Also (although a bit off-topic), note that Snowflake suggests files to be compressed: https://docs.snowflake.com/en/user-guide/data-load-prepare.html#data-file-compression

I mostly use GZip compression type:

COMPRESSION = GZIP

A (working) example:

CREATE FILE FORMAT Public.CSV_GZIP_TABDELIMITED_WITHHEADER_QUOTES_TRIM
    FIELD_DELIMITER = '\t'
    SKIP_HEADER = 1
    TRIM_SPACE = TRUE
    NULL_IF = ('NULL')
    COMPRESSION = GZIP
;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.