A little hesitant about posting this - as far as I'm concerned it's genuine question, but I guess I'll understand if it's criticised or closed as being an invite for discussion...
Anyway, I need to use Python to search some quite large web logs for specific events. RegEx would be good but I'm not tied to any particular approach - I just want lines that contain two strings that could appear anywhere in a GET request.
As a typical file is over 400mb and contains around a million lines, performance both in terms of time to complete and load on the server (ubuntu/nginx VM - reasonably well spec'd and rarely overworked) are likely to be issues.
I'm a fairly recent convert to Python (note quite a newbie but still plenty to learn) and I'd like a bit of guidance on the best way to achieve this
Do I open and iterate through? Grep to a new file and then open? Some combination of the two? Something else?
Server Log Analysis with Pandasthat should be a good starting point for you.