1

I have a log file which has text that looks like this.

Jul  1 03:27:12 syslog: [m_java][ 1/Jul/2013 03:27:12.818][j:[SessionThread <]^Iat com/avc/abc/magr/service/find.something(abc/1235/locator/abc;Ljava/lang/String;)Labc/abc/abcd/abcd;(bytecode:7) 

There are two time formats in the file. I need to sort this log file based on the date time format enclosed in [].

This is the regex I am trying to use. But it does not return anything.

t_pat = re.compile(r".*\[\d+/\D+/.*\]")

I want to go over each line in file, be able to apply this pattern and sort the lines based on the date & time.

Can someone help me on this? Thanks!

4
  • Might it not be easer to use the date and time at the start of the line? Commented Jul 5, 2013 at 15:41
  • Is there really a space between the [ and the 1? Commented Jul 5, 2013 at 15:41
  • the time inside [] has more precision in terms of seconds. And I do get quite a few logs in a sec, that need to be sorted. Commented Jul 5, 2013 at 15:42
  • @MartijnPieters - It is a 'two digit' entry. So there is a space here. It would fit '28' or other two digits Commented Jul 5, 2013 at 15:44

3 Answers 3

2

You have a space in there that needs to be added to the regular expression

text = "Jul  1 03:27:12 syslog: [m_java][ 1/Jul/2013 03:27:12.818][j:[SessionThread <]^Iat com/avc/abc/magr/service/find.something(abc/1235/locator/abc;Ljava/lang/String;)Labc/abc/abcd/abcd;(bytecode:7)"
matches = re.findall(r"\[\s*(\d+/\D+/.*?)\]", text)
print matches
['1/Jul/2013 03:27:12.818']

Next parse the time using the following function

http://docs.python.org/2/library/time.html#time.strptime

Finally use this as a key into a dict, and the line as the value, and sort these entries based on the key.

Sign up to request clarification or add additional context in comments.

Comments

1

You are not matching the initial space; you also want to group the date for easy extraction, and limit the \D and .* patterns to non-greedy:

t_pat = re.compile(r".*\[\s?(\d+/\D+?/.*?)\]")

Demo:

>>> re.compile(r".*\[\s?(\d+/\D+?/.*?)\]").search(line).group(1)
'1/Jul/2013 03:27:12.818'

You can narrow down the pattern some more; you only need to match 3 letters for the month for example:

t_pat = re.compile(r".*\[\s?(\d{1,2}/[A-Z][a-z]{2}/\d{4} \d{2}:\d{2}:[\d.]{2,})\]")

1 Comment

I also think you need to make the last quantifier lazy: [\s?\d+/\D+/.*?]
1

Read all the lines of the file and use the sort function and pass in a function that parses out the date and uses that as the key for sorting:

import re
import datetime

def parse_date_from_log_line(line):
    t_pat = re.compile(r".*\[\s?(\d+/\D+?/.*?)\]")
    date_string = t_pat.search(line).group(1)
    format = '%d/%b/%Y %H:%M:%S.%f'
    return datetime.datetime.strptime(date_string, format)

log_path = 'mylog.txt'
with open(log_path) as log_file:
    lines = log_file.readlines()
    lines.sort(key=parse_date_from_log_line)

2 Comments

I get the below error:date_string = t_pat.search(line).group(1) AttributeError: 'NoneType' object has no attribute 'group'
@SupriyaK this is assuming that line is not None, there's no error checking in the code, if there were it would have to handle the None case and if there's no datetime in the line it needs to decide whether to skip it or not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.