3

I use sort file

ABC
AB-C
ABCDEFG-HI

I get

ABC
AB-C
ABCDEFG-HI

why does sort orders the string this way? how do I make it sort '-' alphabetically?

1
  • Please post the exact sort call you're using, you're likely doing it wrong. Commented Jun 30, 2011 at 8:09

2 Answers 2

5

The solution provided by @cnicutar is correct, but the reason needs explanation which is why I'm giving a new answer.

After the discussion with @cnicutar where in the end I suspected a bug in coreutils' sort I found that this sorting behavior is expected:

At that point sort appears broken because case is folded and punctuation is ignored because ‘en_US.UTF-8’ specifies this behavior.

So to sort, your input seems to be mapped as follows:

ABC -> ABC
AB-C -> ABC
ABCDEFG-HI -> ABCDEFGHI

If you want pure ASCII sorting, you need to call LC_ALL=C sort (temporarily set the locale to C when calling sort which means "standard" behavior without localization; you can also use POSIX instead of C).

On other Unixes this behavior seems to be different (tested on Mac OS X which userland tools are derived from FreeBSD), but LC_ALL=C sort should yield the same behavior across all POSIX systems.

Sign up to request clarification or add additional context in comments.

Comments

3

I remember this :)) try

[cnicutar@aiur ~]$ LANG=POSIX sort
ABC
AB-C
ABCDEFG-HI
AB-C
ABC
ABCDEFG-HI

Alternatively LANG=C should work.

8 Comments

In no language setting should the AB-C appear in the middle (not even with EBCDIC), so I bet that this isn't the problem but instead the way user678070 is calling sort.
@DarkDust I just tried it with different LANGs. For en_US.utf8 it does what the op says (I had this problem before). Do a locale -a and pick some.
@cnicutar: I just tried it myself and I still get the correct sorting. And given that only ASCII characters are used, I very much doubt en_US.utf8 would result in this strange sorting as UTF-8 has ASCII as a subset. Do you get the "unsorted" output when using LANG=foobar ? If so, then your sort doesn't sort when it can't find the locale.
@DarkDust What coreutils version are you using ? What does echo $LANG say ? What does locale -a say ? If you try to use a locale that's unavailable it is ignored. Btw, "I very much doubt" regarding to what I just said I tested sounds a bit offensive, don't you think ?
@DarkDust Even if I reorder the words, with LANG=en_US.utf8 I still get the order of the OP. And locale -a shows I have it available.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.