Input is a log file. The process I'm currently interested in, logs a line at the start and end of the process. The start always has a certain fixed pattern along with an object ID. The end also has a fixed pattern, along with the same object ID.
I want the output to contain a single line per object ID, followed by the timestamp of the first line, followed by the timestamp of the second line. This output will be used for further analysis in other tools. Output should be sorted on the timestamp of the start-line; objects without start lines (see obstacles) should be placed at the end.
I'd like to solve this using standard Unix shell tools. At a guess, something with awk should do the trick. If the solution involves a Unix shell script, please use sh as the shell.
Obstacles: I cannot guarantee that the process is strictly sequential, so the start of object1 can be followed by the start of object2 before object1 has been processed fully. Also, I cannot guarantee that the logfile always matches a start with an end, or vice versa. In such cases, the ID should have an empty value for the missing spot.
Input looks is, in essence, something like this:
2014-03-11 09:00:01.123 bla bla bla TAG_START ID:1234 bla bla bla
2014-03-11 09:00:11.123 bla bla bla TAG_END ID:1234 bla bla bla
2014-03-11 09:01:01.123 bla bla bla TAG_START ID:2353 bla bla bla
2014-03-11 09:02:01.123 bla bla bla TAG_END ID:2353 bla bla bla
2014-03-11 09:03:01.123 bla bla bla TAG_START ID:3456 bla bla bla
2014-03-11 09:04:01.123 bla bla bla TAG_END ID:4567 bla bla bla
Output:
1234;09:00:01.123;09:00:11.123
2353;09:01:01.123;09:02:01.123
3456;09:03:01.123;
4567;;09:04:01.123
Thanks in advance!
4567?