4

I have text file which contains hundreds of thousands of records. One of the fields is a date field. Is there is any way to sort the file based on the date field?

09-APR-12 04.08.43.632279000 AM
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
19-MAR-12 03.54.32.595348000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
27-MAR-12 07.28.02.828746000 PM

The Output should come as

19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 10.28.14.797580000 AM
27-MAR-12 07.28.02.828746000 PM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM

I have tried the sort command to order the date (taking the date field as a string), but it is not giving the correct output.

5 Answers 5

6

Chronicle's solution is close, but misses the AM/PM distinction, sorting 27-MAR-12 07.28.02.828746000 PM before 27-MAR-12 10.28.14.797580000 AM. This can be modified:

sort -t- -k 3.1,3.2 -k 2M -k 1n -k 3.23,3.24

But that is still very fragile. It would be much better to convert the dates to an epoch time and compare numerically.

Sign up to request clarification or add additional context in comments.

Comments

4

Try this :

Input.txt

09-APR-12 04.08.43.632279000 AM 
19-MAR-12 03.53.38.189606000 PM 
19-MAR-12 03.56.27.933365000 PM 
19-MAR-12 04.00.13.387316000 PM 
19-MAR-12 04.04.45.168361000 PM 
19-MAR-12 03.54.32.595348000 PM 
27-MAR-12 10.28.14.797580000 AM 
28-MAR-12 12.28.02.652969000 AM 
27-MAR-12 07.28.02.828746000 PM 

Code

 sort -t "-"  -k 3 -k 2M -nk 1 Input.txt

Output

19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 07.28.02.828746000 PM
27-MAR-12 10.28.14.797580000 AM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM

1 Comment

This outputs 27-MAR-12 07 ... PM before 27-MAR-12 10 ... AM.
1

The Decorate-Sort-Undecorate idiom applied using any awk, any sort, and any cut:

$ awk -F',' -v OFS='\t' '{
    split($NF,t,/[- ]/)
    mthNr = (index("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",t[2])+2)/3
    printf "%02d%02d%02d%s%s\t%s\n", t[3], mthNr, t[1], t[5], t[4], $0
}' file | sort -k1,1 | cut -f2-
19-MAR-12 03.53.38.189606000 PM
19-MAR-12 03.54.32.595348000 PM
19-MAR-12 03.56.27.933365000 PM
19-MAR-12 04.00.13.387316000 PM
19-MAR-12 04.04.45.168361000 PM
27-MAR-12 10.28.14.797580000 AM
27-MAR-12 07.28.02.828746000 PM
28-MAR-12 12.28.02.652969000 AM
09-APR-12 04.08.43.632279000 AM

If you're not sure how that works, look at the output of the awk command that adds the key timestamp to (decorates) the input for sort to operate on before cut (undecorates) removes it again:

$ awk -F',' -v OFS='\t' '{
    split($NF,t,/[- ]/)
    mthNr = (index("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",t[2])+2)/3
    printf "%02d%02d%02d%s%s\t%s\n", t[3], mthNr, t[1], t[5], t[4], $0
}' file
120409AM04.08.43.632279000      09-APR-12 04.08.43.632279000 AM
120319PM03.53.38.189606000      19-MAR-12 03.53.38.189606000 PM
120319PM03.56.27.933365000      19-MAR-12 03.56.27.933365000 PM
120319PM04.00.13.387316000      19-MAR-12 04.00.13.387316000 PM
120319PM04.04.45.168361000      19-MAR-12 04.04.45.168361000 PM
120319PM03.54.32.595348000      19-MAR-12 03.54.32.595348000 PM
120327AM10.28.14.797580000      27-MAR-12 10.28.14.797580000 AM
120328AM12.28.02.652969000      28-MAR-12 12.28.02.652969000 AM
120327PM07.28.02.828746000      27-MAR-12 07.28.02.828746000 PM

and notice that it will sort alphabetically in the desired order.

Comments

0

This script sorts by Epoch time with nanosecond resolution:

awk '{
  t = gensub(/\.([0-9]{2})\./, ":\\1:", 1, $0);
  command = "date +%s%N -d \x022" t "\x022";
  command | getline t;
  close(command);
  print t, $0;
}' unsorted.txt | sort -n -k 1 | cut -d ' ' -f 2- > sorted.txt

Comments

0

You could use date, which is generally probably a decent idea, especially if you don't need to worry about the microseconds, otherwise you could probably clip the microseconds off and sort that as a secondary sorting field.

while read a; do   
grep "^${a}" input.txt; 
done < <(sed 's/\./:/;s/\./:/' input.txt | xargs -n3 -I{} date -d"{}" +%s | sort | xargs -n1 -I{} date -d @'{}' +'%d-%^h-%y %I.%M.%S')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.