0

I've got 1.6GB available to use in a python process. I'm writing a large csv file which data is coming from a database. The problem is: After the file is written, the memory (>1.5GB) is not released immediately which causes an error in the next bit of code (allocating memory fails because the OS cannot find enough memory to allocate).

Does any function exists which would help me release that memory? Or, do you have a better way to do it?

This is the script I'm using to write the file, is writing by chunks to deal with the memory issue:

size_to_read = 20000
sqlData = rs_cursor.fetchmany(size_to_read)

c = csv.writer(open(fname_location, "wb"))
c.writerow(headers)

print("- Generating file %s ..." % out_fname)

while sqlData:
  for row in sqlData:
     c.writerow(row)
  sqlData = rs_cursor.fetchmany(size_to_read)
7
  • Is the code that errors out after the last line? or is it the last line that it errors on? Commented Aug 18, 2016 at 18:47
  • No, it goes hundres of lines down which use low memory and the error comes in this line: file_obs = int(subprocess.check_output(["cat %s/%s | wc -l" % (locationToUpload, filename)], shell=True)) Commented Aug 18, 2016 at 18:54
  • I might be misunderstanding but... sqlData = rs_cursor.fetchmany(size_to_read) on the last line? Everything you just wrote to a file (which you haven't closed), you load it all again? Commented Aug 18, 2016 at 18:54
  • Your indentation is off. Please fix to match your actual code, because you have several loops and we cannot be sure when some terminate Commented Aug 18, 2016 at 18:55
  • It sounds like the memory may be being used for buffers on the file or objects used by the csv.writer. You could test that by adding del c after the code above, which should prompt Python to recover memory used by c and close the file handle which is only referenced by c. In general, it would be better to keep a handle for the file, and then explicitly close it when you're done. See stackoverflow.com/questions/3347775/csv-writer-not-closing-file Commented Aug 18, 2016 at 19:57

1 Answer 1

2

I am thinking the issue is that you never closed the file. Give this a shot.

size_to_read = 20000
sqlData = rs_cursor.fetchmany(size_to_read)

with open(fname_location, "wb")) as f:
    c = csv.writer(f)
    c.writerow(headers)

print("- Generating file %s ..." % out_fname)

while sqlData:
    with open(fname_location, "a") as f: # "a" means to append
        c = csv.writer(f)
        for row in sqlData:
            c.writerow(row)
    sqlData = rs_cursor.fetchmany(size_to_read)

By using with you close the file automatically and releases the memory. Avoids having to explicitly call c.close()

Also I believe you can avoid a loop by like so...

while sqlData:
    with open(fname_location, "wb") as f:
        c = csv.writer(f)
        c.writerows(sqlData) # .writerows
    sqlData = rs_cursor.fetchmany(size_to_read)

Hard to replicate since I don't have the data :(

EDIT

I know this is not really an answer but check out the package memory_profiler to do a line by line assessment to see where you're using a lot of mem. https://pypi.python.org/pypi/memory_profiler

EDIT 2

Here is an example of using a generator to keep your memory usage low.

def results_iter(cursor, n=10000):
    while True:
        results = cursor.fetchmany(n)
        if not results:
            break
        for result in results:
            yield result

with open('file.csv') as f:
    c = csv.writer(f)
    for result in results_iter(rs_cursor, size_to_read)
        c.writerow(result)

via http://code.activestate.com/recipes/137270-use-generators-for-fetching-large-db-record-sets/

If any of this works let us know!

Sign up to request clarification or add additional context in comments.

7 Comments

yeah I don't really know possibly like I said don't know what the data looks like. Or what he means by size_to_read. but thanks for the down vote!
I'd say repeatedly opening it in the event operations in between (i.e. grabbing more data) take up a ton of memory and crash the app. But yes the scope of the loops are confusing. And publishing something wrong does have value because people may pick out issues you didn't think of.
Basically, I'm using the loop because without it, the system will run out of memory whilst writing the file. So, the idea I had is to write chunks of data once all is completed. The size_to_read is the number of lines the process write for each loop.
@ebertbm please update on question so relevant info is centralized. also sry rogan did not see that.
c.writerows(row) should probably be c.writerows(sqlData). Also, with csv.writer(...) as c: may automatically close the file, but that would only be a side-effect of garbage collection when c gets replaced by a new c (at that point, the old c holds the last reference to the file handle, so the file handle will also get collected, which should cause it to be closed). It may be better to start with with open(...) as f: then c = csv.writer(f). See stackoverflow.com/questions/3347775/csv-writer-not-closing-file
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.