-2

I've got a python script that searches for files in a directory and does so infinitely while the computer is running. Here is the code:

import fnmatch
import os
import shutil
import datetime
import time
import gc
# This is a python script that removes the "conflicted" copies of
# files that dropbox creates when one computer has the same copy of
# a file as another computer. 
# Written by Alexander Alvonellos
# 05/10/2012

class cleanUpConflicts:
    rootPath = 'D:\Dropbox'
    destDir = 'D:\Conflicted'
    def __init__(self):
        self.removeConflicted()
        return

    def cerr(message):
        f = open('./LOG.txt', 'a')
        date = str(datetime.datetime.now())
        s = ''
        s += date[0:19] #strip floating point
        s += ' : '
        s += str(message)
        s += '\n'
        f.write(s)
        f.close()
        del f
        del s
        del date
        return

    def removeConflicted(self):
        matches = []
        for root, dirnames, filenames in os.walk(self.rootPath):
            for filename in fnmatch.filter(filenames, '*conflicted*.*'):
                matches.append(os.path.join(root, filename))
                cerr(os.path.join(root, filename))
                shutil.move(os.path.join(root, filename), os.path.join(destDir, filename))
        del matches
            return

def main():
    while True:
        conf = cleanUpConflicts()
        gc.collect()
        del conf
        reload(os)
        reload(fnmatch)
        reload(shutil)
        time.sleep(10)
    return

main()

Anyway. There's a memory leak that adds nearly 1 meg every every ten seconds or so. I don't understand why the memory isn't being deallocated. By the end of it, this script will continuously eat gigs of memory without even trying. This is frustrating. Anyone have any tips? I've tried everything, I think.

Here's the updated version after making some of the changes that were suggested here:

import fnmatch
import os
import shutil
import datetime
import time
import gc
import re
# This is a python script that removes the "conflicted" copies of 
# files that dropbox creates when one computer has the same copy of
# a file as another computer. 
# Written by Alexander Alvonellos
# 05/10/2012

rootPath = 'D:\Dropbox'
destDir = 'D:\Conflicted'

def cerr(message):
    f = open('./LOG.txt', 'a')
    date = str(datetime.datetime.now())
    s = ''
    s += date[0:19] #strip floating point
    s += ' : '
    s += str(message)
    s += '\n'
    f.write(s)
    f.close()
    return


def removeConflicted():
    for root, dirnames, filenames in os.walk(rootPath):
        for filename in fnmatch.filter(filenames, '*conflicted*.*'):
            cerr(os.path.join(root, filename))
            shutil.move(os.path.join(root, filename), os.path.join(destDir, filename))
return


def main():
    #while True:
    for i in xrange(0,2):
        #time.sleep(1)
        removeConflicted()
        re.purge()
        gc.collect()
    return
main()

I've done some research effort on this problem and it seems like there might be a bug in fnmatch, which has a regular expression engine that doesn't purge after being used. That's why I call re.purge(). I've tinkered with this for a couple of hours now.

I've also found that doing:

print gc.collect()

Returns 0 with every iteration.

Whoever downvoted me is clearly mistaken. I really need some help here. Here's the link that I was talking about: Why am I leaking memory with this python loop?

12
  • I would try not reloading those modules every ten seconds - I can't see any reason for it. You also shouldn't need all the del statements (although I can see why you might try them if you have a leak) and you certainly don't need return at the end of every function. Commented May 10, 2012 at 21:43
  • I'm reloading because I am trying to track down the source of this leak. Do you have any other suggestions? Once I get a pointer as to where this is coming from, I'll make some changes to my code. Commented May 10, 2012 at 21:44
  • don't know about the leak, but def cerr(message) should really read def cerr(self, message), and the call further down should be self.cerr(...). how can you have a leak if your code isn't even functional? Commented May 10, 2012 at 21:49
  • Is there a particular reason why you're re-instantiating a new cleanUpConflicts object every time you come into the loop instead of just instantiating it once and re-using it? That seems a bit suspicious to me offhand. Commented May 10, 2012 at 21:50
  • It's kind of a janky program flow to begin with. You basically are using __init__() to make your class pseudo-callable, then not really using the class as a class at all, as there is no stateful or instance information maintained since you blow it away and recreate every loop... Commented May 10, 2012 at 21:51

2 Answers 2

0

Your code could be shortened to this:

import fnmatch
import os
import shutil
import datetime
import time

ROOT_PATH = r'D:/Dropbox'
DEST_DIR = r'D:/Conflicted'

def cerr(message, log):
    date = str(datetime.datetime.now())
    msg = "%s : %s\n" % (date[0:19], message)
    log.write(msg)

def removeConflicted(log):
    for root, dirnames, filenames in os.walk(ROOT_PATH):
        for filename in fnmatch.filter(filenames, '*conflicted*.*'):
            # 1: comment out this line and check for leak
            cerr(os.path.join(root, filename), log)
            # 2: then comment out this line instead and check
            shutil.move(
                os.path.join(root, filename), 
                os.path.join(DEST_DIR, filename))


def main():
    with open('./LOG.txt', 'a') as log:
        while True:
            print "loop"
            removeConflicted(log)
            time.sleep(10)

if __name__ == "__main__":
    main()

See if your memory leak occurs if there are NO files to process. That is, point it at empty directories and determine if the leak is occuring when its doing the move.
You don't need the re.purge() or to mess with the gc module.

Sign up to request clarification or add additional context in comments.

4 Comments

Let me try that and get back to you. Thank you so much for helping me.
@AlexanderAlvonellos: I made comments at two lines you can try commenting out, while pointing at your large directory.
The memory leak still happens regardless of which combination I comment out those lines, and also regardless of whether or not the directory I point the script at is empty. What do I do now? I appreciate your help, but even trying your script verbatim still doesn't solve the memory leak problem.
@AlexanderAlvonellos: Keep reducing the code until you discover the leak.
0

At a guess, something keeps references to the instances created with each main iteration.

Suggestions:

  1. Drop the class and make 1-2 functions
  2. Drop matches; isn't used?
  3. Look at inotify (Linux) or similar for Windows; it can watch the dir and act only when needed; no continuous scanning

2 Comments

I redid the code, got rid of the class, and made the changes that were suggested here, all except for the event watching functions. There's still a memory leak. Any more suggestions?
@AlexanderAlvonellos: Please replace your code example with the more comprehensible short one you said you just altered.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.