Recently, I had to sort several files according to records' ID; the catch was that there can be several types of records, and in each of those the field I had to use for sorting is on a different position. The fields, however, are easily identifiable thanks to key=value structure. To show a simple sample of the general structure:
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
I came up with a pipeline as follows, which did the job:
awk -F'[|=]' '{for(i=1; i<=NF; i++) {if($i ~ "id") {i++; print $i"?"$0} }}' tester.txt | sort -n | awk -F'?' '{print $2}'
In other words the algorithm is as follows:
- Split the record by both field and key-value separators (
|and=) - Iterate through the elements and search for the
idkey - Print the next element (value of
idkey), a separator, and the whole line - Sort numerically
- Remove prepended identifier to preserve records' structure
Processing the sample gives the output:
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
Is there a way, though, to do this task using single awk command?
idfield (which doesn't have a fixed position), so I extracted said value by searching for a key, added it to the record, sorted the output and removed prepended identifier to get clean records; I've added my algorithm to the question, please check if it helps