Sorting a file using fields with specific value

Question

Recently, I had to sort several files according to records' ID; the catch was that there can be several types of records, and in each of those the field I had to use for sorting is on a different position. The fields, however, are easily identifiable thanks to key=value structure. To show a simple sample of the general structure:

fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3

I came up with a pipeline as follows, which did the job:

awk -F'[|=]' '{for(i=1; i<=NF; i++) {if($i ~ "id") {i++; print $i"?"$0} }}' tester.txt | sort -n | awk -F'?' '{print $2}'

In other words the algorithm is as follows:

Split the record by both field and key-value separators (| and =)
Iterate through the elements and search for the id key
Print the next element (value of id key), a separator, and the whole line
Sort numerically
Remove prepended identifier to preserve records' structure

Processing the sample gives the output:

fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3

Is there a way, though, to do this task using single awk command?

Could you please do post samples of expected output in your question to make it better, thank you. — RavinderSingh13
– RavinderSingh13, Commented May 18, 2022 at 13:59
Thanks for the suggestion - added the result of processing the sample — Radioactive Pickle
– Radioactive Pickle, Commented May 18, 2022 at 14:07
Thanks for edit, could you please do explain what is the logic of getting expected output more, thank you. — RavinderSingh13
– RavinderSingh13, Commented May 18, 2022 at 14:19
I had to sort the records according to the value of the id field (which doesn't have a fixed position), so I extracted said value by searching for a key, added it to the record, sorted the output and removed prepended identifier to get clean records; I've added my algorithm to the question, please check if it helps — Radioactive Pickle
– Radioactive Pickle, Commented May 18, 2022 at 14:34

anubhava · Accepted Answer · 2022-05-18 16:32:40Z

1

You may try this gnu-awk code to to this in a single command:

awk -F'|' '{
   for(i=1; i<=NF; ++i)
      if ($i ~ /^id=/) {
         a[gensub(/^id=/, "", 1, $i)] = $0
         break
      }
}
END {
   PROCINFO["sorted_in"] = "@ind_num_asc"
   for (i in a)
      print a[i]
}' file

fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3

We are using | as field delimiter and when there is a column name starting with id= we store it in array a with index as text after = and value as the full record.

Using PROCINFO["sorted_in"] = "@ind_num_asc" we sort array a using numerical value of index and then in for loop we print value part to get the sorted output.

edited May 18, 2022 at 16:32

answered May 18, 2022 at 14:37

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Radioactive Pickle Over a year ago

Does the PROCINFO["sorted_in"] parameter affect all the arrays within current awk command?

Ed Morton · Accepted Answer · 2022-05-18 15:57:37Z

1

Using GNU awk for the 3rd arg to match() and sorted_in:

$ cat tst.awk
match($0,/(^|\|)id=([0-9]+)/,a) {
    ids2vals[a[2]] = $0
}
END {
    PROCINFO["sorted_in"] = "@ind_num_asc"
    for ( id in ids2vals ) {
        print ids2vals[id]
    }
}

$ awk -f tst.awk file
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3

edited May 18, 2022 at 15:57

answered May 18, 2022 at 15:39

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Comments

pii_ke · Accepted Answer · 2022-05-18 16:43:22Z

1

Try Perl: perl -e 'print map { s/^.*? //; $_ } sort { $a <=> $b } map { ($id) = /id=(\d+)/; "$id $_" } <>' file

Some explanation of the code I use:

print #print the resulting list of lines
    map {
        s/^.*? //;
        $_
    } #remove numeric id from start of line
    sort { $a <=> $b } #sort numerically
    map {
        ($id) = /id=(\d+)/;
        "$id $_"
    } # capture id and place it in start of line
    <> # read all lines from file

Or try sed and sort: sed 's/^$.*id=\([0-9][0-9]*$.*\)$/\2 \1/' file | sort -n | sed 's/^[^ ][^ ]* //'

edited May 18, 2022 at 16:43

answered May 18, 2022 at 16:34

pii_ke

2,9112 gold badges22 silver badges31 bronze badges

Comments

RavinderSingh13 · Accepted Answer · 2022-05-18 15:01:26Z

With your shown samples only, please try following(awk + sort + cut) solution, written and tested in GNU awk, should work in any awk.

awk '
match($0,/id=[0-9]+/){
  print substr($0,RSTART,RLENGTH)";"$0
}
' Input_file | sort -t'=' -k2n | cut -d';' -f2-

Explanation: Adding detailed explanation for above code.

awk '                                   ##Starting awk program from here.
match($0,/id=[0-9]+/){                  ##Using awk match function to match id= followed by digits.
  print substr($0,RSTART,RLENGTH)";"$0  ##printing sub string of matched value followed by current line along with semi-colon in it.
}
' Input_file    |                       ##Mentioning Input_file here and passing awk output as a standard input to next command.
sort -t'=' -k2n |                       ##Sorting output with delimiter of = and by 2nd field then passing output to next command as an input.
cut -d';' -f2-                          ##Using cut command making delimiter as ; and printing everything from 2nd field onwards.

Collectives™ on Stack Overflow

Sorting a file using fields with specific value

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related