0

I have a set of image imported from MSSQL in csv. The file size is 1gb. Datatype in MSSQL is image. When I want to import to Postgres, datatype in bytea, error occured.

ERROR: invalid byte sequence for encoding "UTF8": 0xff
CONTEXT: COPY photo, line 1

When I look into the csv file, the image file is in

0xFFD8FFE000104A46494600010101006000600000FFE1...

My questions:

  1. What datatype in PostgreSQL can be used to import this type of file?
  2. How to retrieve image from this type of file using Postgres and PHP?

Solution that I tried:

  1. I tried to copy just three lines and save to new csv file, import it into the photo table, and it succeed. Weird, why is it when I want to import whole csv table, error occurred.
  2. I have tried this https://stackoverflow.com/a/22211207/3602791 in my php using sample image and it was a success, but when I want to retrieve the three lines image that I imported, it failed saying that my image have an error.

http://pastebin.com/WrfjFqY6 This is a sample of line in the csv. 2 columns, id and photo.

Anyone know how to solve this? Thanks in advance.

6
  • Can you show a complete line of the CSV? Is it an image represented as hex in the CSV? If so the error doesn't make much sense. Please show a complete input line and the command you ran to get that error. Commented May 14, 2014 at 5:22
  • Hi @CraigRinger I have updated my question with first line of the csv, where the error occurred. I just import data using pgadmin with csv option in format and delimiter (,). Thanks. Commented May 14, 2014 at 6:08
  • OK, so it is represented as hexadecimal text in the CSV. The error makes no sense in that case, because there doesn't appear to be any 0xff byte in there - but then it's a non-printable character, so it's hard to be sure. At a guess, the file isn't really utf-8, it's some other encoding like latin-1, and it contains a nonprintable 0xff . Use a text editor that can show nonprintable characters to open the file and see. Commented May 14, 2014 at 6:13
  • @CraigRinger But when I try to upload just three line from the 1gb file saved in new csv file, it can be imported successfully. Is it have something to do with the file size? But I don't think file size gonna be a concern, right? I tried import it using Latin1 encoding, but also error, same error but 0x00. Commented May 14, 2014 at 6:59
  • 0x00 means there's a null byte in there. Your CSV is not valid - somewhere it contains raw binary data. Commented May 14, 2014 at 7:07

1 Answer 1

1

As yenyen notes in the comments, the issue was that the input was UCS-2 (probably really UTF-16) encoded.

UCS-2 is a two-byte-per-character encoding that contains null bytes. If you tell PostgreSQL the file is utf-8 then it'll see the input as garbage full of invalid utf-8 sequences. If you tell PostgreSQL it's a simple 1-byte encoding like latin1, PostgreSQL will see the zero (null) byte and realise it's not latin-1 after all.

The trick here is to examine the input file with an editor that can show the raw bytes, not just use a text editor that automagically reads the BOM and loads it as encoded text. If in doubt use a hex editor.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.