3

I've got a very large SQLite table with over 500,000 rows with about 15 columns (mostly floats). I'm wanting to transfer data from the SQLite DB to a Django app (which could be backed by many RDBMs, but Postgres in my case). Everything works OK, but as the iteration continues, memory usage jumps by 2-3 meg a second for the Python process. I've tried using 'del' to delete the EVEMapDenormalize and row objects at the end of each iteration, but the bloat continues. Here's an excerpt, any ideas?

class Importer_mapDenormalize(SQLImporter):
def run_importer(self, conn):
    c = conn.cursor()

    for row in c.execute('select * from mapDenormalize'):
        mapdenorm, created = EVEMapDenormalize.objects.get_or_create(id=row['itemID'])
        mapdenorm.x = row['x']
        mapdenorm.y = row['y']
        mapdenorm.z = row['z']

        if row['typeID']:
            mapdenorm.type = EVEInventoryType.objects.get(id=row['typeID'])

        if row['groupID']:
            mapdenorm.group = EVEInventoryGroup.objects.get(id=row['groupID'])

        if row['solarSystemID']:
            mapdenorm.solar_system = EVESolarSystem.objects.get(id=row['solarSystemID'])

        if row['constellationID']:
            mapdenorm.constellation = EVEConstellation.objects.get(id=row['constellationID'])

        if row['regionID']:
            mapdenorm.region = EVERegion.objects.get(id=row['regionID'])

        mapdenorm.save()
    c.close()

I'm not at all interested in wrapping this SQLite DB with the Django ORM. I'd just really like to figure out how to get the data transferred without sucking all of my RAM.

3
  • Why not just use a straight SQL insert statement to get the data in? Otherwise you could improve the performance by caching the results of the objects.get calls so you're only looking up specific types once. Commented Mar 10, 2010 at 14:31
  • I wanted to use Django's ORM to ensure data integrity across a wide range of database engines. Users may be deploying on any of the RDBMs that Django supports. Commented Mar 10, 2010 at 15:05
  • Silly me, this was addressed in the Django FAQ: docs.djangoproject.com/en/dev/faq/models/… Needed to clear the DB query cache while in DEBUG mode. from django import db db.reset_queries() Commented Mar 10, 2010 at 15:38

2 Answers 2

3

Silly me, this was addressed in the Django FAQ.

Needed to clear the DB query cache while in DEBUG mode.

from django import db 
db.reset_queries()
Sign up to request clarification or add additional context in comments.

1 Comment

Even though the query cache + DEBUG=True bit you this time, you might find it useful in a future round of debugging and/or optimization. The queries are cached in django.db.connection.queries.
1

I think a select * from mapDenormalize and loading the result into memory will always be a bad idea. My advise is - spread script into chunks. Use LIMIT to get data in portions.

Get first portion, work with it, close to cursor, and then get the next portion.

1 Comment

I thought this at first as well, but chunking it up still lead to memory leakage. Memory usage remains unchanged from the situation outlined above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.