1

I have a file that goes:

01:12:12:01
WRGHR
zxerty
00:01:02:03
GRAARGH
qwerty

...

00:59:59:01
URRGH
xqwrts

I want to sort this file using one string as the sortable parameter and having the rest be sorted with it.

My current best is:

cat FILE.txt | tr '\n' '\t' | sed -E 's/\t(0[01]:)/\n\1/g' | sort -n | tr '\t' '\n'

This outputs:

00:01:02:03
GRAARGH
qwerty
00:59:59:01
URRGH
xqwrts
...
01:12:12:01
WRGHR
zxerty

which works, but can I do easier that this?

5
  • Amended question to include current, and desired, output. Commented Apr 2, 2021 at 16:47
  • The cat can go :-), tr ... < file.txt | ... Commented Apr 2, 2021 at 17:27
  • I'll explore this, thanks! Damn, I was trying to figure out what sort of a regex is :-) Commented Apr 2, 2021 at 17:49
  • twa's a smiley not a regex. Commented Apr 2, 2021 at 18:06
  • You don't say? :-) Commented Apr 2, 2021 at 20:19

1 Answer 1

1

Using gnu-awk, you can do this in a single command:

awk -v RS='([0-9]{2}:){3}[0-9]{2}\n' '
prt != "" {
   map[prt] = prt $0
}
{prt = RT}
END {
   PROCINFO["sorted_in"]="@ind_str_asc"
   for (i in map)
      printf "%s", map[i]
}' file

00:01:02:03
GRAARGH
qwerty
00:59:59:01
URRGH
xqwrts
01:12:12:01
WRGHR
zxerty

PROCINFO["sorted_in"]="@ind_str_asc" sorts an array using index as string value in ascending order. Note that we are using date-time stamp as array index in associative array map.


OP's approach can be refactored to this to avoid dependency on tab which may be present in input:

tr '\n' '\1' < file |
sed -E 's/\x1([0-9]{2}:)/\n\1/g; s/\x1$//g' |
sort -t $'\1' -n |
tr '\1' '\n'
Sign up to request clarification or add additional context in comments.

3 Comments

This is very nice! However, I can cobble my way in a few minutes, going from memory/logic, but I couldn't do yours unless I found it here and copy-pasted it. Is there no way to make it more intuitive? If my file hadn't had multiple newlines, I could cat | sort it in no time…
1. This single command is much more efficient. You can compute timing on a few MBs of input. 2. It won't break on certain conditions like presence of tab characters in input file. 3. And finally for this command, it doesn't really matter how many newlines you have in input. It just finds time-stamp strings and treats them as record separator.
btw you approach can be refactored to this to avoid reliance on tab which might be present in input: tr '\n' '\1' < file | sed -E 's/\x1([0-9]{2}:)/\n\1/g; s/\x1$//g' | sort -t $'\1' | tr '\1' '\n'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.