Skip to main content
added 81 characters in body
Source Link
cuonglm
  • 158.2k
  • 41
  • 342
  • 420

uniq POSIX spec described it clearly:

-u
    Suppress the writing of lines that are repeated in the input.

-u option make uniq not to print repeated lines.

Most uniq implementations used bytes comparison, while GNU uniq used collation order to filter duplicated lines. So it can produce wrong result in some locales, example in en_US.UTF-8 locale:

$ printf '%b\n' '\U2460' '\U2461' | uniq
①

and -u gave you no lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq -u
<blank>

So you should set locale to C to get bytes comparison:

$ printf '%b\n' '\U2460' '\U2461' | LC_ALL=C uniq
①
②

uniq POSIX spec described it clearly:

-u
    Suppress the writing of lines that are repeated in the input.

-u option make uniq not to print repeated lines.

Most uniq implementations used bytes comparison, while GNU uniq used collation order to filter duplicated lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq
①

and -u gave you no lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq -u
<blank>

So you should set locale to C to get bytes comparison:

$ printf '%b\n' '\U2460' '\U2461' | LC_ALL=C uniq
①
②

uniq POSIX spec described it clearly:

-u
    Suppress the writing of lines that are repeated in the input.

-u option make uniq not to print repeated lines.

Most uniq implementations used bytes comparison, while GNU uniq used collation order to filter duplicated lines. So it can produce wrong result in some locales, example in en_US.UTF-8 locale:

$ printf '%b\n' '\U2460' '\U2461' | uniq
①

and -u gave you no lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq -u
<blank>

So you should set locale to C to get bytes comparison:

$ printf '%b\n' '\U2460' '\U2461' | LC_ALL=C uniq
①
②
Source Link
cuonglm
  • 158.2k
  • 41
  • 342
  • 420

uniq POSIX spec described it clearly:

-u
    Suppress the writing of lines that are repeated in the input.

-u option make uniq not to print repeated lines.

Most uniq implementations used bytes comparison, while GNU uniq used collation order to filter duplicated lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq
①

and -u gave you no lines:

$ printf '%b\n' '\U2460' '\U2461' | uniq -u
<blank>

So you should set locale to C to get bytes comparison:

$ printf '%b\n' '\U2460' '\U2461' | LC_ALL=C uniq
①
②