Postgres UTF8 ordering

Question

I have this query in Postgres where I'm ordering a small amount of rows according to a varchar field. There seems to be an error in ordering UTF8 strings in Postgres:

For example:

'W' in UTF-8 is 87, while 'g' is 103, but running SELECT 'W' < 'g'; will return false while running SELECT convert_to('W', 'SQL_ASCII') < convert_to('g', 'SQL_ASCII')'; will return true.

The collation is en_US.UTF-8.

Is there a good explanation for this behavior? And how to avoid it?

Daniel · Accepted Answer · 2012-03-01 12:24:45Z

1

The ordering is not on the Unicode code points, but defined by the collation. And in UTF-8 we have 'A'<'a'<'B'<'b' etc.

Most people (except coders) expect this ordering. But feel free to collate with ASCII where you need it.

answered Mar 1, 2012 at 12:24

Daniel

28.2k20 gold badges93 silver badges140 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Clodoaldo Neto · Accepted Answer · 2012-03-06 00:44:34Z

1

This will show the ascii collation ordering of some of the first unicode code points, if you are using the utf8 encoding:

select s, chr(s) from generate_series(32, 255) s order by chr(s) collate "C";

Now the same for the pt_BR (brazilian portuguese) collation:

select s, chr(s) from generate_series(32, 255) s order by chr(s) collate "pt_BR";

What you call collation (en_US.UTF-8) is the collation before the dot and the encoding after the dot.

answered Mar 6, 2012 at 0:44

Clodoaldo Neto

127k30 gold badges251 silver badges274 bronze badges

Collectives™ on Stack Overflow

Postgres UTF8 ordering

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related