0

I have a section of a log file that looks like this:

"/log?action=End&env=123&id=8000&cat=baseball"
"/log?action=start&get=3210&rsa=456&key=golf"

I want to parse out each section so the results would look like this:

('/log?action=', 'End', 'env=123', 'id=8000', 'cat=baseball')
('/log?action=', 'start', 'get=3210', 'rsa=456', 'key=golf')

I've looked into regex and matching, but a lot of my logs have different sequences which leads me to believe that it is not possible. Any suggestions?

3 Answers 3

3

This is clearly a fragment of a URL, so the best way to parse it is to use URL parsing tools. The stdlib comes with urlparse, which does exactly what you want.

For example:

>>> import urlparse
>>> s = "/log?action=End&env=123&id=8000&cat=baseball"
>>> bits = urlparse.urlparse(s)
>>> variables = urlparse.parse_qs(bits.query)
>>> variables
{'action': ['End'], 'cat': ['baseball'], 'env': ['123'], 'id': ['8000']}

If you really want to get the format you asked for, you can use parse_qsl instead, and then join the key-value pairs back together. I'm not sure why you want the /log to be included in the first query variable, or the first query variable's value to be separate from its variable, but even that is doable if you insist:

>>> variables = urlparse.parse_qsl(s)
>>> result = (variables[0][0] + '=', variables[0][1]) + tuple(
    '='.join(kv) for kv in variables[1:])
>>> result
('/log?action=', 'End', 'env=123', 'id=8000', 'cat=baseball')

If you're using Python 3.x, just change the urlparse to urllib.parse, and the rest is exactly the same.

Sign up to request clarification or add additional context in comments.

Comments

0

You can split a couple times:

s = '/log?action=End&env=123&id=8000&cat=baseball'
L = s.split("&")
L[0:1]=L[0].split("=")

Output:

['/log?action', 'End', 'env=123', 'id=8000', 'cat=baseball']

Comments

0

It's a bit hard to say without knowing what the domain of possible inputs is, but here's a guess at what will work for you:

log = "/log?action=End&env=123&id=8000&cat=baseball\n/log?action=start&get=3210&rsa=456&key=golf"

logLines = [line.split("&") for line in log.split('\n')]
logLines = [tuple(line[0].split("=")+line[1:]) for line in logLines]

print logLines

OUTPUT:

[('/log?action', 'End', 'env=123', 'id=8000', 'cat=baseball'), 
 ('/log?action', 'start', 'get=3210', 'rsa=456', 'key=golf')]

This assumes that you don't really need the "=" at the end of the first string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.