1

I have some data that includes a \0 byte in it, and that seems to be valid UTF8 data:

using System;
using System.Text;
                    
public class Program
{
    public static void Main()
    {
        byte[] b = new byte[3];
        b[0] = 65;
        b[1] = 66;
        b[2] = 0;
        
        Console.WriteLine(Encoding.UTF8.GetString(b));
    }
}

That code works fine. But, when trying to update a record in Postgres, it complains about it:

22021: invalid byte sequence for encoding "UTF8": 0x00

The data shouldn't be there, but how can it be that one system accepts it, and another doesn't? I reckon they both implement standards.

1 Answer 1

1

From documenation 8.3. Character Types

+-----------------------------------+----------------------------+
|               Name                |        Description         |
+-----------------------------------+----------------------------+
| character varying(n), varchar(n)  | variable-length with limit |
| character(n), char(n)             | fixed-length, blank padded |
| text                              | variable unlimited length  |
+-----------------------------------+----------------------------+

The characters that can be stored in any of these data types are determined by the database character set, which is selected when the database is created. Regardless of the specific character set, the character with code zero (sometimes called NUL) cannot be stored. For more information refer to Section 23.3.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, the resulting message is a bit confusing though. It looks like the code point is invalid, not that it is not allowed.
Also for some background read this: commandprompt.com/blog/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.