This question is essentially the same as this question, except on Python.
I wish to query rows from a PostgreSQL database ordered by the e-mail address column and then perform operations in Python that rely on that ordering.
The database I'm querying is using the en_US.UTF8 collation, which with a few tests, I'm finding has some peculiar behavior with respect to the @ symbol in the e-mail addresses:
mydb=> SELECT '0' < '@';
?column?
----------
f
(1 row)
mydb=> SELECT '0' < '@0';
?column?
----------
t
(1 row)
This answer suggests that an @ symbol may be ignored by some collations, but if that were the case here, I'd have expected a t from the second query.
Although Python supplies a locale module, that module has inconsistent behavior on some platforms, so I seem to be unable to use that module for this purpose.
Based on that report, I tried the recommendation to use the PyICU package, which seemed promising:
>>> import icu
>>> collator = icu.Collator.createInstance()
>>> collator.getLocale()
<Locale: en_US>
>>> collator.getSortKey('0') < collator.getSortKey('@')
False
>>> collator.getSortKey('0') < collator.getSortKey('@0')
False
But as you can see, in the last comparison, it's yielding a different order than postgres does.
I've tried specifying a different collation for the query, something like:
SELECT email COLLATE posix FROM mytable ORDER by email;
But that results in an error: collation "posix" for encoding "UTF8" does not exist. I tried also a collation of "en-us-x-icu", but that also does not exist.
Is there any way to reliably query a column of e-mail addresses from PostgreSQL in an order upon which a Python program could rely, either by adapting the collation of the query or by honoring the default collation in Python?