8

I'm getting a 1202 Extra column(s) found error in Redshift when trying to load a simple CSV. I've made sure that there are no additional columns nor any unescaped characters in the file that would cause the COPY command to fail with this error.

Here's the created target table:

create table test_table(
  name varchar(500),
  email varchar(500),
  developer_id integer,
  developer_name varchar(500),
  country varchar(20),
  devdatabase varchar(50));

I'm using a simple CSV with no header and only 3 rows of data:

john smith,[email protected],123,johndev,US,comet
jane smith,[email protected],124,janedev,GB,titan
jack smith,[email protected],125,jackdev,US,comet

Unfortunately my COPY command fails with err_1202 "Extra column(s) found".

COPY test_table 
FROM 's3://mybucket/test/test_contacts.csv'    
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
CSV;

There are no additional columns in the file.

4
  • I followed your steps and successfully imported the data into a Redshift table. I've cleaned your question (removed schema name, closed credentials quote, mentioned bucket name), so you might want to confirm that it still matches your situation. I saved the data as a text file in an S3 bucket (not zipped). Commented Mar 15, 2016 at 6:24
  • Sometimes names contain comma (,) you may need to go through your data and quote them Commented Mar 17, 2016 at 1:06
  • Did you check the stl_error table or are you looking at the error message from your SQL client? SELECT err_reason,raw_line,err_code,query,session,tbl FROM stl_load_errors WHERE filename like 's3://mybucket/test/test_contacts%' ORDER BY query DESC, starttime DESC Commented Sep 27, 2016 at 9:23
  • Change your delimiter to ~ if not, try to check to see if your table schema is correct when importing to your environment. Commented Mar 31, 2017 at 22:45

6 Answers 6

13

I was also facing the same issue while loading the data. i rectified using following codes :

copy yourtablename
from 'your S3 Locations'
credentials 'your AWS credentials' 
delimiter ',' IGNOREHEADER 1 
removequotes
emptyasnull
blanksasnull
maxerror 5;
Sign up to request clarification or add additional context in comments.

1 Comment

this did not work for me
4

Try this:

COPY test_table 
FROM 's3://mybucket/test/test_contacts.csv'    
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
delimiter ',' 
ignoreheader as 1 
emptyasnull
blanksasnull
removequotes
escape;

Source: https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#r_COPY_command_examples-copy-data-with-the-escape-option

Comments

1

Make sure the correct delimiter is specified in the copy statement (and the source files). I run into the same issue. After a couple of attempts with different delimiters (while unloading table to s3 files, then copying into another table from the s3 files), I was able to solve the issue by using the delimiter '\t'. Here is the full example in my case:

copy <TABLE-NAME>
from 's3://<FILES/LOCATION>'
access_key_id '<INSERT>'
secret_access_key '<INSERT>'
delimiter '\t'
ignoreheader 1
maxerror 10;

Comments

1

This mostly happens because you are using csv format which by default has ',' as delimiter. And in your data, there will be fields with values that contains ','. This causes the data to have extra columns when try to load to redshift. There are quite a few ways to fix this. It will be mostly easy once you have identified which which column has commas in their value. You can identify the columns by looking at the stl_load errors

SELECT starttime, err_reason,raw_line,err_code,query,session,tbl FROM stl_load_errors WHERE filename like 's3://mybucket/test/%' ORDER BY query DESC, starttime DESC

then fix the column where there are extra columns. let say in this example, 'name' column has extra commas. then lets clean that data

df = (df.withColumn('name', F.regexp_replace(F.col('name'), ',', ' '))
        )

Store the new dataframe in s3 and then use the below copy command to load to redshift

    COPY 'table_name'
FROM 's3 path'
IAM_ROLE 'iam role'
DELIMITER ','
ESCAPE
IGNOREHEADER 1
MAXERROR AS 5
COMPUPDATE FALSE
ACCEPTINVCHARS
ACCEPTANYDATE
FILLRECORD
EMPTYASNULL
BLANKSASNULL
NULL AS 'null';
END;

Comments

0

notice glue is not as robust as one might think, column order plays a major role, check your table order as well as the table input, make sure the order and data types are identical, also see AWS Glue Developer Guide for more info

in addition, make sure you disabled 'Job bookmark' in the 'Job details' tab, for any development or generic job this is a major source of headache and troubles

Comments

-2

For me, it turned out to be that I executed the scripts on the wrong database within the cluster.

1 Comment

Not direktly related the problem in the question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.