The encoding of my postgres database is UTF-8. In a certain table I have a text column into which I would like to insert some data. Now, the data is mostly valid UTF-8, but there are a number of instances of invalid byte sequences which I do not want to remove or substitute. My question is, is there any way of inserting the data into the text column without removing or substituting its invalid byte sequences?
Here's a simple example, executed from the shell (bash) command-line courtesy of psql:
psql main postgres <<<"create table t1 (a text); insert into t1 (a) values (E'a\xC0b');";
## CREATE TABLE
## ERROR: invalid byte sequence for encoding "UTF8": 0xc0 0x62
I know this is probably a long shot, but is there any way of disabling postgres's validation of inserted text, perhaps on an ad hoc basis? I don't see how it would trouble postgres to have some byte sequences in text column data that happen to not be valid for the database's configured character encoding.
If this is not possible, I guess the only recourse is to store the data as straight binary data using the bytea data type, but please let me know if there's a better solution out there.
textalways has an encoding (otherwise the database wouldn't know how to convert the bytes to characters, especially with variable length encodings such as UTF-8). If you just have a stream of bytes then you havebyteadata, nottext. Of course, things likelengthwill work differently (comparelength('µ')andlength('µ'::bytea)for an example) so you're left with a choice of which pain you want to suffer.