How can I write a large csv file using Python?

Question

I need to extract a big amount of data(>1GB) from a database to a csv file. I'm using this script:

rs_cursor = rs_db.cursor()
rs_cursor.execute("""SELECT %(sql_fields)s
                     FROM table1""" % {"sql_fields": sql_fields})
sqlData = rs_cursor.fetchall()
rs_cursor.close()

c = csv.writer(open(filename, "wb"))
c.writerow(headers)
for row in sqlData:
    c.writerow(row)

The problem comes when is writing the file the system runs out of memory. In this case, is there any other and more efficient way to create a large csv file?

The problem most probably is with sqlData, not the fact the you write this data to a file. Where does this data come from? Do you have any control over it? if you do, you should be looking into reading it in chunks or as a generator. — DeepSpace
– DeepSpace, Commented Aug 10, 2016 at 15:09
How are you getting the SQL data? Can you show us that code? — drsnark
– drsnark, Commented Aug 10, 2016 at 15:11
I added the bit of code of sqlData. The data is coming from a massive table. — ultraInstinct
– ultraInstinct, Commented Aug 10, 2016 at 15:13
What database/library are you using? In pymssql you can use fetchmany with the size argument so it doesn't return the whole table at once, see its docs You can also consider using WHERE in order to SELECT from the table in chunks. — DeepSpace
– DeepSpace, Commented Aug 10, 2016 at 15:16
Thanks DeepSpace, I'm using psycopg2 (redshift). In that case, how can I write the file without overwriting it if I'm reading by chunks? — ultraInstinct
– ultraInstinct, Commented Aug 10, 2016 at 15:20

DeepSpace · Accepted Answer · 2016-08-10 15:33:00Z

psycopg2 (which OP uses) has a fetchmany method which accepts a size argument. Use it to read a certain number of lines from the database. You can expirement with the value of n to balance between run-time and memory usage.

fetchmany docs: http://initd.org/psycopg/docs/cursor.html#cursor.fetchmany

    rs_cursor = rs_db.cursor()
    rs_cursor.execute("""SELECT %(sql_fields)s
                         FROM table1""" % {"sql_fields": sql_fields})
    c = csv.writer(open(filename, "wb"))
    c.writerow(headers)

    n = 100
    sqlData = rs_cursor.fetchmany(n)

    while sqlData:
        for row in sqlData:
            c.writerow(row)
        sqlData = rs_cursor.fetchmany(n)

   rs_cursor.close()

You can also wrap this with a generator to simplify the code a little bit:

def get_n_rows_from_table(n):
    rs_cursor = rs_db.cursor()
    rs_cursor.execute("""SELECT %(sql_fields)s
                             FROM table1""" % {"sql_fields": sql_fields})
    sqlData = rs_cursor.fetchmany(n)

    while sqlData:
        yield sqlData
        sqlData = rs_cursor.fetchmany(n)
    rs_cursor.close()

c = csv.writer(open(filename, "wb"))
c.writerow(headers)

for row in get_n_rows_from_table(100):
    c.writerow(row)

xulfir · Accepted Answer · 2016-08-10 15:17:32Z

0

Have you tried fetchone()?

rs_cursor = rs_db.cursor()
rs_cursor.execute("""SELECT %(sql_fields)s
                     FROM table1""" % {"sql_fields": sql_fields})

c = csv.writer(open(filename, "wb"))
c.writerow(headers)
row = rs_cursor.fetchone()
while row:
    c.writerow(row)
    row = rs_cursor.fetchone()

rs_cursor.close()

answered Aug 10, 2016 at 15:17

xulfir

1313 bronze badges

2 Comments

DeepSpace Over a year ago

While this approach will work, it can be very slow as database I/O tends to be a slow process.

DeepSpace Over a year ago

You can see my answer for another approach using fetchmany.

Collectives™ on Stack Overflow

How can I write a large csv file using Python?

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related