Python: parsing sections of a log file

Question

I have a section of a log file that looks like this:

"/log?action=End&env=123&id=8000&cat=baseball"
"/log?action=start&get=3210&rsa=456&key=golf"

I want to parse out each section so the results would look like this:

('/log?action=', 'End', 'env=123', 'id=8000', 'cat=baseball')
('/log?action=', 'start', 'get=3210', 'rsa=456', 'key=golf')

I've looked into regex and matching, but a lot of my logs have different sequences which leads me to believe that it is not possible. Any suggestions?

abarnert · Accepted Answer · 2013-10-22 23:43:30Z

This is clearly a fragment of a URL, so the best way to parse it is to use URL parsing tools. The stdlib comes with urlparse, which does exactly what you want.

For example:

>>> import urlparse
>>> s = "/log?action=End&env=123&id=8000&cat=baseball"
>>> bits = urlparse.urlparse(s)
>>> variables = urlparse.parse_qs(bits.query)
>>> variables
{'action': ['End'], 'cat': ['baseball'], 'env': ['123'], 'id': ['8000']}

If you really want to get the format you asked for, you can use parse_qsl instead, and then join the key-value pairs back together. I'm not sure why you want the /log to be included in the first query variable, or the first query variable's value to be separate from its variable, but even that is doable if you insist:

>>> variables = urlparse.parse_qsl(s)
>>> result = (variables[0][0] + '=', variables[0][1]) + tuple(
    '='.join(kv) for kv in variables[1:])
>>> result
('/log?action=', 'End', 'env=123', 'id=8000', 'cat=baseball')

If you're using Python 3.x, just change the urlparse to urllib.parse, and the rest is exactly the same.

beroe · Accepted Answer · 2013-10-22 23:40:26Z

0

You can split a couple times:

s = '/log?action=End&env=123&id=8000&cat=baseball'
L = s.split("&")
L[0:1]=L[0].split("=")

Output:

['/log?action', 'End', 'env=123', 'id=8000', 'cat=baseball']

answered Oct 22, 2013 at 23:40

beroe

12.4k6 gold badges40 silver badges82 bronze badges

Comments

Brionius · Accepted Answer · 2013-10-22 23:41:18Z

0

It's a bit hard to say without knowing what the domain of possible inputs is, but here's a guess at what will work for you:

log = "/log?action=End&env=123&id=8000&cat=baseball\n/log?action=start&get=3210&rsa=456&key=golf"

logLines = [line.split("&") for line in log.split('\n')]
logLines = [tuple(line[0].split("=")+line[1:]) for line in logLines]

print logLines

OUTPUT:

[('/log?action', 'End', 'env=123', 'id=8000', 'cat=baseball'), 
 ('/log?action', 'start', 'get=3210', 'rsa=456', 'key=golf')]

This assumes that you don't really need the "=" at the end of the first string.

answered Oct 22, 2013 at 23:41

Brionius

14.2k3 gold badges41 silver badges50 bronze badges

Collectives™ on Stack Overflow

Python: parsing sections of a log file

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related