0

I am trying to break a huge SQL file into little sql file and I am using python to achieve this, but the code that I'm using doesn't match and from what I've seen on google it should.

Here is the code:

    import sys, re
    p = [0]
    f = open('/root/testsql/data.sql', 'r')
    tables =["tabel1", "table2"]
    contor = 0;
    con = 0;

    for line in f:
        for table in tables:
            stri = "root/testsql/" + str(con)
            con = con + 1
            stri2 = ".*" + table + ".*"
            if re.match(stri2,line):
                    print table
                    f2 = open(stri,"w")
                    f2.write(line)
                    f2.close()

If anybody has an idea why re.match doesn't work, it would be much appreciated.

The sql file is very long (73595 lines)and contains lines like:

insert into table ...
insert into table
7
  • What is the output when you write f2? Edit: or is that not happening at all? Commented Jun 22, 2012 at 15:43
  • I have no output, that is the problem. I have the print table in that if just to verify that i enter the if but there is no output, so it doesn't enter the loop Commented Jun 22, 2012 at 15:51
  • Copying the text you show for your lines and tables and then doing an re.match did lead to matchs. Are you sure the text you show is what you're getting from the file? Commented Jun 22, 2012 at 16:07
  • I hope it's not the real account data that you have posted. Commented Jun 22, 2012 at 16:09
  • Serves me right for working over time, I can't even think straight Commented Jun 22, 2012 at 16:17

4 Answers 4

3

You're only looking for verbatim strings. In that case, regex is overkill. Instead, use in:

for line in f:
    for table in tables:
        # snip...
        if table in line:
            # ...
Sign up to request clarification or add additional context in comments.

6 Comments

I think the matching was not the problem. I used your suggestion, but still no result. No files were created and nothing was shown to the console output. The thing is there are 73595 lines and some of them are very long and maybe that is why it doesn't work.
I tested from the python console and it worked with your suggestion, but still from the script it doesn't work.
@primero: You're right, my approach wouldn't have changed the results. But the length of the lines is certainly not the problem. I think you should insert some more print statements and check whether the program flow really is as expected. And maybe check your SQL file in an editor and do some searches there to make sure.
@primero: Hmm, grasping at straws here. What character encoding does your SQL file use? Not UTF-16 by any chance? What do you get when you insert a print line after for line in f:?
Well I get the INSERT statements one by one separated by a new line
|
2

I think

stri2 = ".*" + table + ".*"

should be:

stri2 = ".*?" + table + ".*"

The .* is greedy and will match the whole line.

4 Comments

it should be faster this way, but .* should backtrack in the first place
Still not entering the if statement. I tried on the python console my match statement and it seemed to work, but i can't seem to enter the in the if from my script.
And as you can see from the code in the if statement i create some files. None of them get created. I'm new to python an I don't know how to debug this.
@primero - the info you provided in comments belongs to the body of the question.
1

You should use re.search instead of re.match instead of wrapping the regex in .*.

The reason why you see no matches is that the the inputs end with a newline, and the dot metacharacter does not match newlines.

5 Comments

My first draft was with re.search, and i used re.compile to create my pattern but i don't know how to test in the if statement.
f2 is not a file, but a lot of files, 1 in each cycle, so w is ok in this case
But now that i think of it it might be the same like in my example
Ad 1: True, but the regex should still have matched. Ad 2: He's not appending anything. He's writing lots of one-line files. Or trying to.
@primero - re.search is used in exactly the same way as re.match except that it will fix your code. It returns the same kind of a match object.
0

I would use a raw string instead of a plain string in any regular expression so that you don't end up fooling yourself when a char get interpreted.

r'.*' + table + r'.*'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.