1

I want to select executed query from log file. Specifically an example would look something like this:

2019-01-10 10:33:21 +07 dvdrentalLOG: statement:  SELECT last_update 
    From public.actor
2019-03-06 14:07:06 +07 dvdrentalLOG:  statement: SELECT film_id, title
    FROM public.film
    WHERE film_id = 1

I want to get the queries using looping. desired output:

query1 : SELECT last_update From public.actor
query2 : SELECT film_id, title FROM public.film WHERE film_id = 1

This I have tried:

import re
def parseFile(filepath):
    line=[]
    with open(filepath,'r') as log:
        regex = re.compile(r'(\d{4}-\d{2}-\d{2})(.*)',re.MULTILINE|re.DOTALL)
        for line in log:
            date = regex.findall(line)
            if date == []:
                print()
            else:
                print(date)

filepath = 'text.txt'
parseFile(filepath)

output:
 [('2019-01-10', ' 10:33:21 +07 dvdrentalLOG: statement:  SELECT last_update \n')]
 [('2019-03-06', ' 14:07:06 +07 dvdrentalLOG:  statement: SELECT film_id, title\n')]

the output don't select all the queries. what should I do?

2 Answers 2

1

You can adapt your code like this (you need to read the whole file before parsing it, if you read line by line as you did in your code, your regex will only parse a line after another and will never be able to select the whole SQL queries split on several lines) :

import re
def parseFile(filepath):
    line=[]
    with open(filepath,'r') as log:
        regex = re.compile(r'(\d{4}-\d{2}-\d{2})(.*?)(?=\d{4}-\d{2}-\d{2}|$)',re.MULTILINE|re.DOTALL)
        lines = re.sub('\n|\s{2,}',' ',log.read())#.replace('\n', '')
        date = regex.findall(lines)
        if date == []:
          print()
        else:
          print(date)

filepath = 'query.log'
parseFile(filepath)

output:

[('2019-01-10', ' 10:33:21 +07 dvdrentalLOG: statement: SELECT last_update From public.actor '), ('2019-03-06', ' 14:07:06 +07 dvdrentalLOG: statement: SELECT film_id, title  FROM public.film  WHERE film_id = 1 ')]

Where the regex (using positive lookahead to limit the number of characters matched by .*?) used is detailed here: https://regex101.com/r/nE0omm/1/

(\d{4}-\d{2}-\d{2})(.*?)(?=\d{4}-\d{2}-\d{2}|$)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your help
@jesicagu You are welcome! Don't hesitate to vote up our answers as well as you have reached more than 15pts of reputation, you can now do it. Thank you!
1

You're only processing a single line at a time (via the for line in log: loop), so your regex only applies to a single line at a time. It couldn't match across lines because you're not giving it multiple lines at a time to match across.

You could instead read the entire file via log.read() and then call .findall on that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.