How do I sort multiple strings in bash, using one string as the sortable parameter?

Question

I have a file that goes:

01:12:12:01
WRGHR
zxerty
00:01:02:03
GRAARGH
qwerty

...

00:59:59:01
URRGH
xqwrts

I want to sort this file using one string as the sortable parameter and having the rest be sorted with it.

My current best is:

cat FILE.txt | tr '\n' '\t' | sed -E 's/\t(0[01]:)/\n\1/g' | sort -n | tr '\t' '\n'

This outputs:

00:01:02:03
GRAARGH
qwerty
00:59:59:01
URRGH
xqwrts
...
01:12:12:01
WRGHR
zxerty

which works, but can I do easier that this?

I'll explore this, thanks! Damn, I was trying to figure out what sort of a regex is :-) — Dan
– Dan, Commented Apr 2, 2021 at 17:49

anubhava · Accepted Answer · 2021-04-02 18:21:18Z

1

Using gnu-awk, you can do this in a single command:

awk -v RS='([0-9]{2}:){3}[0-9]{2}\n' '
prt != "" {
   map[prt] = prt $0
}
{prt = RT}
END {
   PROCINFO["sorted_in"]="@ind_str_asc"
   for (i in map)
      printf "%s", map[i]
}' file

00:01:02:03
GRAARGH
qwerty
00:59:59:01
URRGH
xqwrts
01:12:12:01
WRGHR
zxerty

PROCINFO["sorted_in"]="@ind_str_asc" sorts an array using index as string value in ascending order. Note that we are using date-time stamp as array index in associative array map.

OP's approach can be refactored to this to avoid dependency on tab which may be present in input:

tr '\n' '\1' < file |
sed -E 's/\x1([0-9]{2}:)/\n\1/g; s/\x1$//g' |
sort -t $'\1' -n |
tr '\1' '\n'

edited Apr 2, 2021 at 18:21

answered Apr 2, 2021 at 16:57

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Dan Over a year ago

This is very nice! However, I can cobble my way in a few minutes, going from memory/logic, but I couldn't do yours unless I found it here and copy-pasted it. Is there no way to make it more intuitive? If my file hadn't had multiple newlines, I could cat | sort it in no time…

anubhava Over a year ago

1. This single command is much more efficient. You can compute timing on a few MBs of input. 2. It won't break on certain conditions like presence of tab characters in input file. 3. And finally for this command, it doesn't really matter how many newlines you have in input. It just finds time-stamp strings and treats them as record separator.

anubhava Over a year ago

btw you approach can be refactored to this to avoid reliance on tab which might be present in input: tr '\n' '\1' < file | sed -E 's/\x1([0-9]{2}:)/\n\1/g; s/\x1$//g' | sort -t $'\1' | tr '\1' '\n'

Collectives™ on Stack Overflow

How do I sort multiple strings in bash, using one string as the sortable parameter?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related