0

I have a program that process several files, and for each file a report is generated. The report generating part is a separate function that takes a filename, then returns. During report generation, intermediate parts are cached in memory, as they may be used for several parts of the report, to avoid recalculating.

When I run this program on all files in a directory, it will run for a while, then crash with a MemoryError. If I then rerun it on the same directory, it will skip all files that it successfully created a report for, and continue on. It will process a couple of files before crashing again.

Now, why isn't all resources cleared, or marked at least for garbage collection, after the method call that generates the report? There are no instances leaving, and I am not using any global objects, and after each file processing, all open files are closed.

Are there ways for me to verify that there is no extra references to an object? Is there a way to force garbage collection in Python?

A bit more detail about the implementation and cache. Each report has several elements in it, each element can then rely on different computations, each computation can depend on other computations. If one computation is already done, I don't want to do it again (most of these are expensive).

Here is an abbreviated version off the cache:

class MathCache:
    def __init__(self): self.cache = {}
    def get(data_provider):
        if not data_provider.id in self.cache:
            self.cache[data_provider.id] = data_provider.get_value(self)
        return self.cache[data_provider.id]

An instance of it is created, and then passed to each element in the report. This instance is only kept in a local reference in the report creation method.

All data_providers inherit from a common class that serves to make a unique id for the instance based on a hash off constructor arguments and class name. I pass on the MathCache as the data_provider itself may rely on other calculations.

6
  • A bit more code would be helpful. The "intermediate parts are cached in memory" is vague -- and likely the cause of your problem. Python has excellent garbage collection. Somehow you're preventing this. Commented May 26, 2009 at 11:37
  • Are you sure lines 4 and 5 of your example are right? That should throw a KeyError. Commented May 26, 2009 at 12:38
  • I guess you mean self.cache[data_provide.id] = get_value(self) and not ".get_value(self)", right? Commented May 26, 2009 at 12:47
  • Sorry, yeah. Was a bit quick on the typing, as it's not cut and paste code from my project. Commented May 26, 2009 at 13:04
  • If this is the only code, then the deletion of MathCache should delete all the references. The bug has to be somewhere else. Commented May 26, 2009 at 13:10

1 Answer 1

3

You should check out the gc module: http://docs.python.org/library/gc.html#module-gc.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.