4

I have a sizable table (20M+) in Postgres am I try to do a raw Django query on it:

tweets = TweetX.objects.raw("SELECT * from twitter_tweet").using("twittertest")

I get a RawQuerySet fast, but when I try to iterate over its results it is grinding to a halt:

for tweet in tweets:
   #do stuff

Memory is steadily rising so I suspect the whole dataset is being transferred. Is there a way to get a database cursor from .raw so I can iterate over the result set without transferring it all at once?

1
  • if this same query does not run fast from the pgsql prompt you should check the postgresql server first of all. Also I dont know how the Django ORM translate a "select * " but it's better to do a select colA,colB.... Commented Aug 22, 2013 at 13:50

1 Answer 1

5

It seems that it is rather difficult to persuade django/postgres to use database cursors. Instead it fetches everything and then put a client side iterator (called cursor) over it.

Found a solution over here that explicitly creates a db cursor. Only downside is it does not fit into django models anymore.

from django.db import connections

conn = connections['twittertest']
# This is required to populate the connection object properly
if conn.connection is None:
    cursor = conn.cursor()        

cursor = conn.connection.cursor(name='gigantic_cursor')
cursor.execute("SELECT * from twitter_tweet")

for tweet in cursor:
    #profit
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.