0
LogFormat "%v %a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combinedvhost
CustomLog "/var/log/apache2/access_log" combinedvhost    

I have an apache configuration producing an access_log with above log format. I'm trying to create a python (2.7.13) regex that creates groups (ignoring HTTP method and HTTP version).

Below is my regex so far:

(?P<host>.*)\s+(?P<ip>\S+)\s+-\s+-\s+\[(?P<date>\S+)\s+(?P<timezone>.*)\]\s+"\S+\s+(?P<path>\S+)(?:\?(?P<querystring>\S+))?\s+\S+"\s+(?P<status>\S+)\s+(?P<length>\S+)\s+"(?P<referrer>.*)"\s+"(?P<user_agent>.*)"\s+

My problem is the first log line where the expected result is path = / and querystring = simplode_ajax=true&simplode_query%5Border%5D=DESC. It seams like my path group is matching to greedy though as it returns querystring = None and the entire string as path instead...

I was testing above regex and below log at http://pythex.org.

default 1.2.3.4 - - [05/Jan/2017:10:56:18 -0800] "GET /?simplode_ajax=true&simplode_query%5Border%5D=DESC HTTP/1.1" 200 - "http://www.xxx.xx/xxx/xx/" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
default 1.2.3.4 - - [05/Jan/2017:10:56:20 -0800] "GET /xxx/xx/06/22/xxxxx/ HTTP/1.1" 200 11098 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:20 -0800] "POST /xxxxxx.php HTTP/1.1" 200 370 "-" "-"
default 1.2.3.4 - - [05/Jan/2017:10:56:23 -0800] "GET /blog/xxx/01/22/xxxxx/ HTTP/1.1" 200 14404 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:24 -0800] "GET /blog/xxxxx/ HTTP/1.1" 200 21901 "https://www.codingmerc.com/blog/" "Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:25 -0800] "POST /xxxxx.php HTTP/1.1" 200 370 "-" "-"
www.xxx.xx 1.2.3.4 - - [05/Jan/2017:10:56:29 -0800] "GET /blog/xxxxx/ HTTP/1.1" 200 13831 "https://www.xxx.xx/blog/" "Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )"

1 Answer 1

1

It seems to work if you simply make your path group non greedy: replace+ by +?

Sign up to request clarification or add additional context in comments.

1 Comment

Sheesh, it was that simple, huh? :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.