Return to Answer

added 81 characters in body

Source Link

edited Jun 18, 2015 at 16:08

158.2k
41
342
420

-u
    Suppress the writing of lines that are repeated in the input.

-u option make uniq not to print repeated lines.

Most uniq implementations used bytes comparison, while GNU uniq used collation order to filter duplicated lines. So it can produce wrong result in some locales, example in en_US.UTF-8 locale:

$ printf '%b\n' '\U2460' '\U2461' | uniq
①

and -u gave you no lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq -u
<blank>

So you should set locale to C to get bytes comparison:

$ printf '%b\n' '\U2460' '\U2461' | LC_ALL=C uniq
①
②

uniq POSIX spec described it clearly:

-u
    Suppress the writing of lines that are repeated in the input.

-u option make uniq not to print repeated lines.

Most uniq implementations used bytes comparison, while GNU uniq used collation order to filter duplicated lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq
①

and -u gave you no lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq -u
<blank>

So you should set locale to C to get bytes comparison:

$ printf '%b\n' '\U2460' '\U2461' | LC_ALL=C uniq
①
②

uniq POSIX spec described it clearly:

-u
    Suppress the writing of lines that are repeated in the input.

-u option make uniq not to print repeated lines.

Most uniq implementations used bytes comparison, while GNU uniq used collation order to filter duplicated lines. So it can produce wrong result in some locales, example in en_US.UTF-8 locale:

$ printf '%b\n' '\U2460' '\U2461' | uniq
①

and -u gave you no lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq -u
<blank>

So you should set locale to C to get bytes comparison:

$ printf '%b\n' '\U2460' '\U2461' | LC_ALL=C uniq
①
②

Source Link

answered Jun 18, 2015 at 10:33

cuonglm

158.2k
41
342
420

uniq POSIX spec described it clearly:

-u
    Suppress the writing of lines that are repeated in the input.

-u option make uniq not to print repeated lines.

Most uniq implementations used bytes comparison, while GNU uniq used collation order to filter duplicated lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq
①

and -u gave you no lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq -u
<blank>

So you should set locale to C to get bytes comparison:

$ printf '%b\n' '\U2460' '\U2461' | LC_ALL=C uniq
①
②