0

I need to sort the data in a file. Sort order is Column 7,2. The last column (Column 8) is null:

    1|1|1|1|1|1|12333|    
    3|3|3|3|3|3|44454|    
    2|2|2|2|2|2|22222|    
    1|1|1|1|1|1|123300000|    

When I use the following command I get a strange value in the output file:

sort -o /test1/FILE2 -T /test1/Junk -t\| -k7,7 -k2,2 /test1/Junk/FILE2_1  

Where

  • /test1/FILE2 is the input file
  • /test1/Junk is the temporary directory
  • /test1/Junk/FILE2_1 is the output file

Values in the output file

    1|1|1|1|1|1|123300000|    
    1|1|1|1|1|1|12333|    
    2|2|2|2|2|2|22222|    
    3|3|3|3|3|3|44454|    

Any idea why the row containing 123300000 is coming up first?

I need the sorting like below:

    1|1|1|1|1|1|12333|    
    1|1|1|1|1|1|123300000|    
    2|2|2|2|2|2|22222|    
    3|3|3|3|3|3|44454|    

2 Answers 2

4

Normally, you choose either numeric or lexicographical (dictionary) ordering.

If you wanted those values sorted numerically, you would need a -n in your sort command:

pax> echo '1|1|1|1|1|1|12333|    
3|3|3|3|3|3|44454|    
2|2|2|2|2|2|22222|    
1|1|1|1|1|1|123300000|' | sort -t \| -k7,7 -k2,2 -n

1|1|1|1|1|1|12333|    
2|2|2|2|2|2|22222|    
3|3|3|3|3|3|44454|    
1|1|1|1|1|1|123300000|

If, on the other hand, you don't want it sorted numerically, then the output you have is already correct as far as I can see:

                v
1|1|1|1|1|1|123300000|    
1|1|1|1|1|1|12333|    
                ^

Note the highlighted characters. Since 0 comes before 3, this is the right lexicographical order.

Changing that large value to 123330000 results in the order you seem to be after:

pax> echo '1|1|1|1|1|1|12333|    
3|3|3|3|3|3|44454|    
2|2|2|2|2|2|22222|    
1|1|1|1|1|1|123330000|' | sort -t \| -k7,7 -k2,2

1|1|1|1|1|1|12333|    
1|1|1|1|1|1|123330000|
2|2|2|2|2|2|22222|    
3|3|3|3|3|3|44454|   

Hence I suspect you're just misreading the data in this case.


If, as you state in a comment, the test data was incorrect, the presence or absence of the final | character should make no difference to the sort order. First, lexicographical sorting with and without |:

pax> echo ; echo '1|1|1|1|1|1|12333|
3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123330000|' | sort -t \| -k7,7 -k2,2

1|1|1|1|1|1|12333|
1|1|1|1|1|1|123330000|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|

pax> echo ; echo '1|1|1|1|1|1|12333
3|3|3|3|3|3|44454
2|2|2|2|2|2|22222
1|1|1|1|1|1|123330000' | sort -t \| -k7,7 -k2,2

1|1|1|1|1|1|12333
1|1|1|1|1|1|123330000
2|2|2|2|2|2|22222
3|3|3|3|3|3|44454

You can see there that 123330000 is second in both cases.

Similarly, for numerical sorting with and without |, the larger number appears at the end:

pax> echo ; echo '1|1|1|1|1|1|12333| 
3|3|3|3|3|3|44454| 
2|2|2|2|2|2|22222| 
1|1|1|1|1|1|123330000|' | sort -t \| -k7,7 -k2,2 -n

1|1|1|1|1|1|12333|
2|2|2|2|2|2|22222|
3|3|3|3|3|3|44454|
1|1|1|1|1|1|123330000|

pax> echo ; echo '1|1|1|1|1|1|12333 
3|3|3|3|3|3|44454 
2|2|2|2|2|2|22222 
1|1|1|1|1|1|123330000' | sort -t \| -k7,7 -k2,2 -n

1|1|1|1|1|1|12333
2|2|2|2|2|2|22222
3|3|3|3|3|3|44454
1|1|1|1|1|1|123330000

If you're seeing something else then either your sort is broken or it's configured strangely. You might want to investigate, if that's the case, whether you have a sort function or alias overriding the real one (with which sort, for example), or whether you have a bizarre LC_ALL setting, which affects the comparison function used for sorting.

With GNU sort, at least, you can also use --debug to annotate the output, indicating which line portions are used as keys.

And, finally, one other possibility may be the presence of non-printing characters in your input that may be affecting sort order. You can detect these by getting a hex dump of the file and checking it:

od -xcb /test1/Junk/FILE2_1
Sign up to request clarification or add additional context in comments.

8 Comments

If there is no 8th column then I am getting the desired result. This means if the last column (NULL) is not there then the result is 1|1|1|1|1|1|12333 1|1|1|1|1|1|123300000 2|2|2|2|2|2|22222 3|3|3|3|3|3|44454 I want to know whether the last column with NULL value is impacting the sorting.
@Aravindh, if you're getting 123300000 after 12333 for anything other than numeric sorting (or some bizarre correlation settings which don't seem to be the case here), your sort is broken.
Sorry. I made a mistake. the correct data is 1|1|1|1|1|1|12333| 3|3|3|3|3|3|44454| 2|2|2|2|2|2|22222| 1|1|1|1|1|1|123330000| Assuming if the 8th column is not there then the result is 1|1|1|1|1|1|12333 1|1|1|1|1|1|123330000 2|2|2|2|2|2|22222 3|3|3|3|3|3|44454
@Aravindh, then that is the correct result since you've specified lexicographical sorting. The presence or absence of a | at the end of each line makes no difference to the sort order on my system, either with or without -n (-n itself changes the sort order, but the | is not affecting that). If it does on yours, sort is broken, which should be investigated.
Thanks for the explanation. I tried some scenarios in the unix box and got the below result. Second one is strange. 1.printf "1|12\n1|120\n"|sort -t \| -k2,2 Expected: 1|12 1|120 Actual result: 1|12 1|120 2.printf "1|12|\n1|120|\n"|sort -t \| -k2,2 Expected: 1|12| 1|120| Actual Result: 1|120| 1|12| 3.printf "1|12||\n1|120||\n"|sort -t \| -k2,2 Expected: 1|12|| 1|120|| Actual Result: 1|12|| 1|120|| When I tried the same scenarios in a UNIX cygwin64 terminal then I am getting the desired result (Expected result).
|
0

The Ordering is done lexicographical as you said. Your command is almost correct but use n in sort command, like,

3|3|3|3|3|3|44454|
2|2|2|2|2|2|22222|
1|1|1|1|1|1|123330000|' | sort -t \| -nk7,7 -nk2,2

This will sort the data numerically.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.