Examining large log files in python

Question

A little hesitant about posting this - as far as I'm concerned it's genuine question, but I guess I'll understand if it's criticised or closed as being an invite for discussion...

Anyway, I need to use Python to search some quite large web logs for specific events. RegEx would be good but I'm not tied to any particular approach - I just want lines that contain two strings that could appear anywhere in a GET request.

As a typical file is over 400mb and contains around a million lines, performance both in terms of time to complete and load on the server (ubuntu/nginx VM - reasonably well spec'd and rarely overworked) are likely to be issues.

I'm a fairly recent convert to Python (note quite a newbie but still plenty to learn) and I'd like a bit of guidance on the best way to achieve this

Do I open and iterate through? Grep to a new file and then open? Some combination of the two? Something else?

There was a great presentation at PyCon 2013 titled Server Log Analysis with Pandas that should be a good starting point for you. — Burhan Khalid
– Burhan Khalid, Commented Apr 24, 2013 at 7:46

Vyktor · Accepted Answer · 2013-04-24 07:50:57Z

2

As long as you don't read whole file at once but iterate trough it continuously you should be fine. I think it doesn't really matter whether you read whole file with python or with grep, you still have to load whole file :). And if you take advantage of generators you can do this really programmer friendly:

# Generator; fetch specific rows from log file
def parse_log(filename):
    reg = re.prepare( '...')

    with open(filename,'r') as f:
       for row in f:
           match = reg.match(row)
           if match:
               yield match.group(1)

for i in parse_log('web.log'):
    pass # Do whatever you need with matched row

answered Apr 24, 2013 at 7:50

Vyktor

21.1k6 gold badges69 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Examining large log files in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related