1

I have a text log file that looks like this:

Line 1 - Date/User Information
Line 2 - Type of LogEvent
Line 3-X, variable number of lines with additional information,
          could be 1, could be hundreds

Then the sequence repeats.

There are around 20K lines of log, 50+ types of log events, approx. 15K separate user/date events. I would like to parse this in Python and make this information queryable.

So I thought I'd create a class LogEvent that records user, date (which I extract and convert to datetime), action, description... something like:

    class LogEvent():
        def __init__(self,date,user):
            self.date = date # string converted to datetime object
            self.user = user
            self.content = ""

Such an event is created each time a line of text with user/date information is parsed.

To add the log event information and any descriptive content, there could be something like:

    def classify(self,logevent):
        self.logevent = logevent

    def addContent(self,lineoftext):
        self.content += lineoftext

To process the text file, I would use readline() and proceed one line at a time. If the line is user/date, I instantiate a new object and add it to a list...

    newevent = LogEvent(date,user)
    eventlist.append(newevent)

and start adding action/content until I encounter a new object.

    eventlist[-1].classify(logevent)
    eventlist[-1].addContent(line)

All this makes sense (unless you convince me there is a smarter way to do it or a useful Python module I am not aware of). I'm trying to decide how best to classify the log event type when working with a set list of possible log event types that might hold more than 50 possible types, and I don't just want to accept the entire line of text as the log event type. Instead I need to compare the start of the line against a list of possible values...

What I don't want to do is have 50 of these:

    if line.startswith("ABC"):
        logevent = "foo"
    if line.startswith("XYZ"):
        logevent = "boo"

I thought about using a dict as lookup table but I am not sure how to implement that with the "startswith"... Any suggestions would be appreciated, and my apologies if I was way too long winded.

1
  • it would help if you told us what a typical "Type of LogEvent" line looks like and what you want to be recorded in the logevent attribute. Also, do you have the various types of log events in a list or better yet, a set? Commented Jul 31, 2014 at 4:14

2 Answers 2

0

If you have a dictionary of your logEvent types as keys and whatever you want to go into the logevent attribute as values, you can do this,

logEvents = {"ABC":"foo", "XYZ":"boo", "Data Load":"DLtag"}

and the line from your log file is this,

line = "Data Load: 127 row uploaded"

you can check if any of the keys above are at the beginning of the line,

for k in logEvents:
    if line.startswith(k): 
        logevent = logEvents[k]

This will loop over all the keys in logEvents and check if line starts with one of them. You can do whatever you like after the if conditional. You could put this into a function that is called after a line of text with user/date information is parsed. If you want to do something if no keys are found you can do this,

 for k in logEvents:
    if line.startswith(k): 
        logevent = logEvents[k]
        return
 raise ValueError( "logEvent not recognized.\n line = " + line )

Note, the exact type of exception you raise is not super important. I chose one of the builtin exceptions to avoid subclassing. Here you can see all the builtin exceptions.

Sign up to request clarification or add additional context in comments.

3 Comments

So a typical log event could look like this "Data Load: 127 row uploaded" followed by 127 lines detailing what was uploaded (which would be mapped to my content attribute. -- So I really should've been clearer. It's more than just deciding how to process the line based on the first few characters (although that works for some events) but rather defining a series of processing methods that handle the different types of lines (log events).
So maybe I should use "event in line" logic rather than "startswith" and a look-up table for functions...
you can use startswith if you like. if the format of the log is always the same you wont have any problems. I'm not exactly sure what the global goal is, but this will check log event lines after which you can steer the logic
0

Since I didn't do a good job posing my question, I have given it more thought and come up with this answer, which is similar to this thread.

I would like a clean, easily manageable solution to process each line of text differently, based on whether certain conditions are met. I didn't want to use a bunch of if/else clauses. So I tried instead to move both condition and consequence (processing) into a decisionDict = {}.

### RESPONSES WHEN CERTAIN CONDITIONS ARE MET - simple examples
def shorten(line):
    return line[:25]

def abc_replace(line):
    return line.replace("xyz","abc")

### CONDITIONAL CHECKS FOR CONTENTS OF LINES OF TEXT - simple examples
def check_if_string_in_line(line):
    response = False
    if "xyz" in line:
        response = True
    return response

def check_if_longer_than25(line):
    response = False
    if len(line)>25:
        response = True
    return response

### DECISION DICTIONARY - could be extended for any number of condition/response
decisionDict = {check_if_string_in_line:abc_replace, check_if_longer_than25:shorten}

### EXAMPLE LINES OF SILLY TEXT
lines = ["Alert level raised to xyz",
    "user 5 just uploaded duplicate file",
    "there is confusion between xyz and abc"]

for line in lines:
    for k in decisionDict.keys():
        if k(line):#in line:
            print decisionDict[k](line)

This keeps all the conditions and actions neatly separated. It also currently does not allow for more than one condition to apply to any one line of text. Once the first condition that resolves to 'True', we move on to the next line of text.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.