1

I have the following class:

 class C1:
    STORE = []
    TYPE = []
    ITEMS = []
    PRICE = []


    def __init__(self,STORE,TYPE,ITEMS,PRICE):
        self.STORE = STORE
        self.TYPE = TYPE
        self.ITEMS = ITEMS
        self.PRICE = PRICE

The purpose of the class is to store all the items in different stores and their prices. Items are alphanumerically ordered, and if the item doesn't exist in a store it shows the price as 0. I am retrieving the data from a table in a database that looks like this:

           S1  S2  S3 .... S29000
item1      15   2  30 ....    100
item2       0   1   0 ....      5
.
.
.
item600     30 190 10 ....     25

The lists STORE and ITEMS look like the following accordingly:

STORE: ['S1','S2',...,'S29000'] ITEM: ['item1','item2',....,'item600']

For the PRICE list it is a multidimensional list which by specifying the store index and item index it would give you the price of the specified item at specified store (ex. price[0][0] will get you the price of item1 at S1 which is 15).

With all this data in a class, I run 'reports' with some 'complex' computation involved.

I am running into the problem that I am creating class objects from different tables and the memory usage of python reaches almost 1.8 GB according to Windows Task Manager.

I know that my objects are the main reason of the memory consumption, which brings me to two questions:

  1. I 'del' every class object after I use them and gc and it appears python doesn't want to let go of the memory even after the script is done. Is there a brute force method to free it up?

  2. Are there data structures other than lists which would consume less memory and improve my performance? I noticed Numpy as being an option but I am forced to use Python 2.3 and it appears that it is not compatible with Numpy.

I had previously tried to read the database every time that I wanted to compute something which would take my program to run for almost 3 hours, but now when storing the data into a classes it takes 40 minutes. So my 'client' doesn't want to go to the old way.

Thanks in advance.

EDIT: The original data looks like the table that I described previously, but it is only accessible through the API provided by the vendor.

EDIT2: My intention is to create various objects of type C1 for multiple 'data sources.' So I would end up with 6 C1 type objects containing distinct data in each one of them

EDIT3: In order to access the list of prices of items per store, the API has one function which is in the form of GetPrices('Store Name'). So It is necessary to call this function for every store. Therefore currently, my code that genereates C1 objects has one huge For Loop that calls this function for every Store.

6
  • 1
    Is there a reason the memory consumption is a problem? Assuming the hardware is fairly recent, you should be able to use a 64-bit OS and 64-bit Python on it, which will easily handle that. If you're running into physical memory restrictions, buying more memory might be cheaper than spending a day trying to reduce memory consumption. Commented Apr 19, 2011 at 21:48
  • 2
    Your repost suggested that perhaps you have a memory leak. I doubt it. On my system an empty list takes up 72 bytes. Every element adds 8 bytes. Assuming you have 800 lists of 30000 items/stores in your price matrix, that's 72 * 800 + 800 * 8 * 30000 = 192057600. If you have six of those, as you indicated in your repost, that makes 1152345600 byes. That's already a gigabyte. Now add the prices: the size of a float is 24 bytes. 800 * 30000 * 24 = 576000000. Now add the two: 1728345600. Even if there is a memory leak, it's trivial compared to the actual memory being used. Commented Apr 20, 2011 at 3:21
  • Using the class definition you gave above, the STORE, TYPE, ITEMS, and Prices lists are all class attributes, which means these lists are shared amongst all instances of this class. It's unclear by your description if you mean that you have multiple instances of this class alive at the same time, but if you do, is it your intent that when you instantiate one of these classes, the data you are passing in gets shared by ALL of the instances that are alive at that point? Commented Apr 20, 2011 at 3:33
  • If you look at the __init__ definition, those class level attributes aren't actually used. Commented Apr 20, 2011 at 4:08
  • I do the following code in order to create my 'objects' Data_Arr = [] for x in xrange(0,6): Data_Arr.append(RetrieveData(data[x]) RetrieveData creates an object of type C1 specifying the data source (data[x]). Within this function it uses the API provided by the developer in order to retrieve the data. Commented Apr 20, 2011 at 4:28

5 Answers 5

3

You might try the array module, it was around in Python 2.3. Other than that, you may want to look into using a proper database for this stuff.

Sign up to request clarification or add additional context in comments.

1 Comment

I changed my class to use arrays instead of lists and it reduced my memory consumption by almost two thirds. Thank you!
1

You have a 800 x 30,000 matrix. That's 24 000 000 elements per array. That's already at about 100 MB of space if they're floats, but more because of object overhead. And you have six of these beasts?

If 1.8 GB is too much then you'll have to store less. Sorry, but this is why real number crunching can be hard. Make sure you only have the data you need, and that's it.

If most of that matrix is empty then I'd suggest looking at some sparse matrix libraries. SciPy/NumPy have the most common ones, but I'm sure someone else provides something workable with Python 2.3. Maybe an old NumPy?

9 Comments

it is not 800 x 30,000,000 but 800 x 30,000
That's still a very large dense matrix.
@Adam, see my comment to the question -- 100 MB is a huge underestimate.
So you can't have a 4-byte float? I'm off by a factor of two compared to your estimate. It's still a huge array, that's my point.
Assuming the data is sparse, i.e. not every store has a price for every item, then a sparse matrix structure can potentially save lots of space. Or a proper DB.
|
1

Using Python 2.3 is going to limit your options (including excluding the option of going to 64-bit). That's also the main reason the memory is not being released back to the OS: the internal object allocator in CPython didn't gain the ability to return no longer used memory to the OS until 2.5.

If you can, try running the algorithm on 2.7 and see what gains you're able to achieve purely by using a more recent version of the interpreter (or what compatibility problems would arise in such a migration).

And, as others have suggested, optimise your data structures. Check the algorithmic complexity of the operations you perform regularly, and see if there is a way to convert O(n*n) operations to O(n*logn) and O(n) to O(logn) or O(1).

Even if the underlying data structures can't change, you may be able to use the bisect module to speed up some operations on your lists.

Comments

0

Items are alphanumerically ordered [...]

With all this data in a class, I run 'reports' with some 'complex' computation involved.

Are there data structures other than lists which would consume less memory and improve my performance?

I'm just guessing at your algorithms here: linear-time searching? If so, using an OrderedDict may improve performance greatly.

That won't solve the memory issue, though; consider using a proper database package instead, e.g. SQLite + SQLAlchemy or plain old bsddb with B-trees.

3 Comments

That would be a good idea, but he's stuck with Python from 7 years ago for some reason.
@tkerwin: Python 2.3 has bsddb. The OrderedDict from AS Recipes may be backported to 2.3: code.activestate.com/recipes/576693
@tkerwin the reason why I am stuck with Python 2.3 is because the application that stores our data has an API only compatible with 2.3
0

It's hard to say something without knowing more about your algorithm, but perhaps would http://docs.python.org/library/struct.html be an option? Or cyton? Pyrex?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.