2

I try to add special char "†" with psql to varchar field, but no success. From php application it works (php user as iso8859-1).

setting for db are:

encoding = LATIN1
collation = fi_FI
character type = fi_FI
client both UTF8 & LATIN1 (on commandline PGCLIENTENCODING=LATIN1 or PGCLIENTENCODING=UTF8)

selecting from table shows when client is UTF8

locationx \u0086

How to add value from psql to database? Neither below are not working.

update tablex set field1 = 'locationY' || '†'
update tablex set field1 = 'locationY' || U&'\86'

giving error messages.

ERROR:  character with byte sequence 0xe2 0x80 0xa0 in encoding "UTF8" has no equivalent in encoding "LATIN1"
ERROR:  invalid Unicode escape value at or near "\86' "

If I view the data entered by my PHP application, the bytes are \x6c6f636174696f6e5986, but when I enter the data with psql, the bytes are \x6c6f636174696f6e59e280a0.

1 Answer 1

4

It doesn't work from either PHP or psql, because the character does not exist in LATIN-1 encoding. You just cannot store it in the database.

Let me explain what is going on.

  • If your client encoding is LATIN1 and you enter in psql:

    INSERT INTO ... VALUES ('locationY†');
    

    gets stored successfully, because your terminal is set to UTF-8. So the you type is actually three bytes: \xE280A0, which are interpreted and stored as three single-byte characters.

  • If your client encoding is UTF8 and you enter in psql:

    The same insert will cause an error, because the three bytes that are entered when you type will correctly be interpreted as the dagger character, and there will be an error when PostgreSQL tries to convert the character to LATIN:

    ERROR:  character with byte sequence 0xe2 0x80 0xa0 in encoding "UTF8" has no equivalent in encoding "LATIN1"
    
  • With PHP, your client encoding is probably set to LATIN1, and the PHP program actually uses the WINDOWS-1252 encoding.

    Then is represented by the single byte \x86. That is interpreted by PostgreSQL in the LATIN1 encoding, where it means something entirely different, namely the “start of selected area” control character U+0086.

    Now when your PHP program reads that character back, everything seems to work fine, but the database actually stores a different character than you intend.

    You will notice that as soon as you try to select the value by any other means, e.g. on your psql console. There the value will be rendered as

    locationY\u0086
    

Here is a solution how to get things working:

  • Create a new database with UTF8 encoding.

  • Dump the old database with

    pg_dump -F p -E LATIN1 dbname
    
  • Manually edit the dump and change the line

    SET client_encoding = 'LATIN1';
    

    to

    SET client_encoding = 'WIN1252';
    
  • Load the dump into the new database with psql.

  • change the client_encoding of your PHP application to WIN1252 and start using the new database.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.