Following my previous post on my old post and since it didn't fully answer my question. I would like to know how can I sort my array a containing multiple lines of a particular tag code from array b.
I have an array a that the following lines
rs6605071 chr1:962943 C ENSG00000188976 ENST00000487214 stuff
rs6605071 chr1:962943 C ENSG00000187961 ENST00000622660 stuff
rs6605071 chr1:962943 C 84069 NM_001160184.1 stuff
rs6605071 chr1:962943 C 339451 NC_006462594.2 stuff
rs6605071 chr1:962943 C ENSG00000135234 ENST00000624144 stuff
rs6605071 chr1:962943 C 339451 XR_001737138.1 stuff
rs6605071 chr1:962943 C 334324 NC_006462632.2 stuff
rs6605071 chr1:962943 C 84333 NM_004353462.1 stuff
rs6605071 chr1:962943 C 339451 XM_006710600.3 stuff
and another ordered array b that has the following lines:
NC
NG
NM
NP
NR
XM
XP
XR
WP
I would like to order the lines in array a to match the order of array b on column 5 to obtain to desired output:
rs6605071 chr1:962943 C 334324 NC_006462632.2 stuff
rs6605071 chr1:962943 C 339451 NC_006462594.2 stuff
rs6605071 chr1:962943 C 84069 NM_001160184.1 stuff
rs6605071 chr1:962943 C 84333 NM_004353462.1 stuff
rs6605071 chr1:962943 C 339451 XM_006710600.3 stuff
rs6605071 chr1:962943 C 339451 XR_001737138.1 stuff
rs6605071 chr1:962943 C ENSG00000188976 ENST00000487214 stuff
rs6605071 chr1:962943 C ENSG00000187961 ENST00000622660 stuff
rs6605071 chr1:962943 C ENSG00000135234 ENST00000624144 stuff
The following command has been proposed in my previous post:
awk -v OFS='\t' '
FNR==NR{
split($5,a,"_")
array[a[1]]=$0
next
}
($1 in array) {
print array[$0]
b[$1]
}
END{
for(i in b){
delete array[i]
}
for(j in array){
print array[j]
}
}' <(printf '%s\n' "${a[@]}") <(printf '%s\n' "${b[@]}")
but it prints:
rs6605071 chr1:962943 C 334324 NC_006462632.2 stuff
rs6605071 chr1:962943 C 84069 NM_001160184.1 stuff
rs6605071 chr1:962943 C 339451 XM_006710600.3 stuff
rs6605071 chr1:962943 C 339451 XR_001737138.1 stuff
rs6605071 chr1:962943 C ENSG00000188976 ENST00000487214 stuff
rs6605071 chr1:962943 C ENSG00000187961 ENST00000622660 stuff
rs6605071 chr1:962943 C ENSG00000135234 ENST00000624144 stuff
As you see, there are lines containing NM and NC missing. Could you please tell me how I can update this command to output the desired result ?
Thanks in advance.
bis sorted according to a fixed pattern and not alphabetically. It just happened to be this way but it could be the other way around, it's called associative arrays in bash. check this link. So, the order ofbmust not change and the lines of arrayahas to match the order of arrayb. Thanks !