0

I have and input file like this: 

Objects (id: bounding-box centroid area mean-color): 
0: 800x800+0+0 406.6,390.9 378792 srgb(0,0,0) 
11: 240x151+140+624 259.5,699.0 36240 srgb(255,255,255) 
 6: 240x151+462+176 581.5,251.0 36240 srgb(255,255,255) 
 7: 240x151+87+257 206.5,332.0 36240 srgb(255,255,255) 
 8: 240x151+366+355 485.5,430.0 36240 srgb(255,255,255) 
 9: 240x151+77+448 196.5,523.0 36240 srgb(255,255,255) 
 10: 240x151+468+542 587.5,617.0 36240 srgb(255,255,255) 
 2: 178x59+223+65 311.5,94.0 10502 srgb(255,255,255) 
 3: 178x59+417+65 505.5,94.0 10502 srgb(255,255,255) 
 4: 178x59+611+65 699.5,94.0 10502 srgb(255,255,255) 
1: 178x59+29+65 117.5,94.0 10502 srgb(255,255,255) 
5: 110x16+255+63 309.5,182.5 1760 srgb(255,255,255)   

I'm interested in second field, for example the 2nd element of second field is "240x151+140+624". If we use as field separator "+" for this second field, then would be 3 fields within original 2nd field.     I want to have and awk array (in this case array "a") with this 2nd field sorted first by 3rd subfield and then by second subfield (where new FS="+").    

 I'm doing this with the code below and it works but I need first an awk program, then pipe to sort then pipe again for the 2nd awk program.   

 awk 'NR>2{print $2}' file | 
 sort -t "+" -k3n -k2n |  
 awk '{a[NR]=$0} END{
  for (i=1;i<=length(a);i++) print a[i] }' 
 110x16+255+63 
 178x59+29+65 
 178x59+223+65 
 178x59+417+65 
 178x59+611+65 
 240x151+462+176  
 240x151+87+257 
 240x151+366+355  
 240x151+77+448  
 240x151+468+542 
 240x151+140+624 

 How to get the sorted array "a" in a single awk program (without pipe twice) to be able to make further processing in the END{} block? 

 Thanks in advance

2
  • Arrays in awk are never sorted, they're hash tables. You can print out their contents in a specific order using various techniques and language constructs but the array itself isn't sorted. Why is the 800x800+0+0 line missing from your output? Commented Oct 28, 2018 at 10:32
  • Hi EdMorton, The input file shows coordinates and sizes of some regions within an image. I skip the first coordinate because it represents the entire image itself and I don't need that coordinate and I don't need the header either. Commented Oct 28, 2018 at 16:11

1 Answer 1

2

Here is one for GNU awk Using Predefined Array Scanning Orders with gawk:

$ awk '
{
    split($2,t,"+")                       # split $2 to tmp on +
    a[t[3]][t[2]][NR]=$2                  # most controlling key is the first...
}                                         # etc, NR to make it unique
END {
    PROCINFO["sorted_in"]="@ind_num_asc"  # scanning order, see the link
    for(i in a)
        for(j in a[i])
            for(k in a[i][j])
                print a[i][j][k]
}' file

Output:

(id:
800x800+0+0
110x16+255+63
178x59+29+65
178x59+223+65
178x59+417+65
178x59+611+65
240x151+462+176
240x151+87+257
240x151+366+355
240x151+77+448
240x151+468+542
240x151+140+624

Edit: That might be an overkill, this may work as well but it's too early in the morning to make that decision or to make tests. If you do test it, let us know, we're a community after all:

$ awk '
{
    split($2,t,"+")                       # for example: 240x151+140+624
    a[t[3] "+" t[2] "+" NR]=$2            # key: t["624+140+3"]=240x151+140+624
}
END {
    PROCINFO["sorted_in"]="@ind_num_asc"
    for(i in a)
        print a[i]
}' file

Output look[ed] the same:

- - 

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the help, but I need a array sorted in order to print in END{} block in a especific order. For example having the array sorted I need to print element 2 to 5, then print element 1, then element 6 to N-1 and finally print last element. Is there a way to know in your array the position of the element within array in order to print how I need?
a[t[3]][t[2]][NR]=$2 sets only the $2 as the array value. Set anything you need in needed order into it.
@GerCas again, arrays in awk are hash tables, they are not stored in any user-meaningful order. You can, however, access the values in whatever order of indices or values you like for printing, etc. using "sorted_in" as James shows using GNU awk.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.