2

I need to extract time(02/Jan/2015:08:12), article_id, and user_id

line format looks like this:

67.15.143.7 - - [02/Jan/2015:08:12] "GET/click?article_id=25&user_id=104 HTTP/1.1" 200 2327
67.15.143.7 - - [02/Jan/2015:08:12] "GET/click?article_id=211&user_id=9408 HTTP/1.1" 200 380

I'm a beginner and I did search on google and stack overflow, but I haven't find the way to solve it. Can anyone help me? Thanks!

1
  • You probably want to start reading into python regular expression usage, the re module will probably get all the info out of the line you're after. Learning how to write a regex can be a steep learning curve but will pay off massively in the long run. Log analyser programs like logstash use regex heavily to extract info Commented Apr 13, 2016 at 21:19

2 Answers 2

1

A simple regex can extract that.

>>> import re
>>> s = '''67.15.143.7 - - [02/Jan/2015:08:12] "GET/click?article_id=25&user_id=104 HTTP/1.1" 200 2327
... 67.15.143.7 - - [02/Jan/2015:08:12] "GET/click?article_id=211&user_id=9408 HTTP/1.1" 200 380'''
>>> re.findall('\[(.*?)\].*?article_id=(\d+).*?user_id=(\d+)',s)
[('02/Jan/2015:08:12', '25', '104'), ('02/Jan/2015:08:12', '211', '9408')]

Use re.search instead of re.findall if you want to apply the pattern to individual lines.

Sign up to request clarification or add additional context in comments.

Comments

1
import re
result = re.findall(r'.*\[(.+)\].+article\_id\=(\d+)\&user_id\=(\d+).*',your_string) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.