0

I am trying to sort and join two files which contain IP addresses, the first file only has IPs, the second file contains IPs and an associated number. But sort acts differently in these files. here are the code and outcomes:

cat file | grep '180.76.15.15' | sort
cat file | grep '180.76.15.15' | sort -k 1
cat file | grep '180.76.15.15' | sort -t ' ' -k 1

outcome:

180.76.15.150 987272
180.76.15.152 52219
180.76.15.154 52971
180.76.15.156 65472
180.76.15.158 35475
180.76.15.15 99709
cat file | grep '180.76.15.15' | cut -d ' ' -f 1 | sort

outcome:

180.76.15.15
180.76.15.150
180.76.15.152
180.76.15.154
180.76.15.156
180.76.15.158

As you can see, the first three commands all produce the same outcome, but when lines only contain IP address, the sorting changes which causes me a problem trying to join files.

Explicitly, the IP 180.76.15.15 appears at the bottom row in the first case (even when I sort explicitly on the first argument), but at the top row in the second case and I can't understand why.

Can anyone please explain why is this happening?

P.S. I am ssh connecting through windows 10 powershell to ubuntu 20.04 installed on VMware.

1 Answer 1

1

sort will use your locale settings to determine the order of the characters. From man sort also:

*** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

This way you can use the ASCII characters order. For example:

> cat file
#a
b#
152
153
15 4
15 1

Here all is sorted with the alphabetical order excluding special characters, first the numbers, then the letters.

thanasis@basis:~/Documents/development/temp> sort file
15 1
152
153
15 4
#a
b#

Here all characters count, first #, then numbers, but the space counts also, then letters.

thanasis@basis:~/Documents/development/temp> LC_ALL=C sort file
#a
15 1
15 4
152
153
b#
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much, it resolved my problem, but I still can't understand the different outcomes in the first place, as both were running on the same system with the same file and consequently whatever local setting was in place, it should have been affecting them all. why that wasn't the case? anyway, adding LC_ALL=C seems to be resolving the issue, thank you again.
In your example, 15$ goes first, before any other line starting with 15, that is irrelevant to the locale. Because you have cut the string. 15[[:space:]]9 is treated like 159 and goes after all 15[0-8] lines, that is the locale influence in the sorting order.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.