1

I have a text file that needs to be analysed. Each line in the file is of this form:

7:06:32 (slbfd) IN: "lq_viz_server" aqeela@nabltas1  

7:08:21 (slbfd) UNSUPPORTED: "Slb_Internal_vlsodc" (PORT_AT_HOST_PLUS   ) Albahraj@nabwmps3  (License server system does not support this feature. (-18,327))

7:08:21 (slbfd) OUT: "OFM32" Albahraj@nabwmps3

I need to skip the timestamp and the (slbfd) and only keep a count of the lines with the IN and OUT. Further, depending on the name in quotes, I need to increase a variable count for different variables if a line starts with OUT and decrease the variable count otherwise. How would I go about doing this in Python?

2
  • 1
    what have you tried so far? where are you stuck? Commented Jun 22, 2012 at 14:20
  • you can parse the line using a .split() command. Do you have any code attempting to parse a single line? Once you can parse one line you should be able to parse them all. After that its just a matter of checking the correct elements in each line with logic Commented Jun 22, 2012 at 14:22

5 Answers 5

5

The other answers with regex and splitting the line will get the job done, but if you want a fully maintainable solution that will grow with you, you should build a grammar. I love pyparsing for this:

S ='''
7:06:32 (slbfd) IN: "lq_viz_server" aqeela@nabltas1  
7:08:21 (slbfd) UNSUPPORTED: "Slb_Internal_vlsodc" (PORT_AT_HOST_PLUS   ) Albahraj@nabwmps3  (License server system does not support this feature. (-18,327))
7:08:21 (slbfd) OUT: "OFM32" Albahraj@nabwmps3'''

from pyparsing import *
from collections import defaultdict

# Define the grammar
num = Word(nums)
marker = Literal(":").suppress()
timestamp = Group(num + marker + num + marker + num)
label = Literal("(slbfd)")
flag = Word(alphas)("flag") + marker
name = QuotedString(quoteChar='"')("name")

line    = timestamp + label + flag + name + restOfLine
grammar = OneOrMore(Group(line))

# Now parsing is a piece of cake!  
P = grammar.parseString(S)
counts = defaultdict(int)

for x in P:
    if x.flag=="IN": counts[x.name] += 1
    if x.flag=="OUT": counts[x.name] -= 1

for key in counts:
    print key, counts[key]

This gives as output:

lq_viz_server 1
OFM32 -1

Which would look more impressive if your sample log file was longer. The beauty of a pyparsing solution is the ability to adapt to a more complex query in the future (ex. grab and parse the timestamp, pull email address, parse error codes...). The idea is that you write the grammar independent of the query - you simply convert the raw text to a computer friendly format, abstracting away the parsing implementation away from it's usage.

Sign up to request clarification or add additional context in comments.

Comments

1

If I consider that the file is divided into lines (I don't know if it's true) you have to apply split() function to each line. You will have this:

["7:06:32", "(slbfd)", "IN:", "lq_viz_server", "aqeela@nabltas1"]  

And then I think you have to be capable of apply any logic comparing the values that you need.

Comments

1

i made some wild assumptions about your specification and here is a sample code to help you start:

objects = {}
with open("data.txt") as data:
    for line in data:
        if "IN:" in line or "OUT:" in line:
            try:
                name = line.split("\"")[1]
            except IndexError:
                print("No double quoted name on line: {}".format(line))
                name = "PARSING_ERRORS"
            if "OUT:" in line:
                diff = 1
            else:
                diff = -1
            try:
                objects[name] += diff
            except KeyError:
                objects[name] = diff
print(objects) # for debug only, not advisable to print huge number of names

Comments

1

You have two options:

  1. Use the .split() function of the string (as pointed out in the comments)
  2. Use the re module for regular expressions.

I would suggest using the re module and create a pattern with named groups.

Recipe:

  • first create a pattern with re.compile() containing named groups
  • do a for loop over the file to get the lines use .match() od the
  • created pattern object on each line use .groupdict() of the
  • returned match object to access your values of interest

Comments

0

In the mode of just get 'er done with the standard distribution, this works:

import re
from collections import Counter
# open your file as inF...
count=Counter()
for line in inF:
    match=re.match(r'\d+:\d+:\d+ \(slbfd\) (\w+): "(\w+)"', line)
    if match:
        if match.group(1) == 'IN': count[match.group(2)]+=1
        elif match.group(1) == 'OUT': count[match.group(2)]-=1

print(count)

Prints:

Counter({'lq_viz_server': 1, 'OFM32': -1})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.