I'm working on a project that gets data from two NetCDF files, each of which is 521.8 MB. Admittedly, these are fairly large files. I am working on a MacBook Pro which has 4 GB of memory on it, but the computer is approximately 4 years old. Code is written in Python.
The files contains a year's worth of weather data across the Earth. It is a 4D array that contains the time (length 1460), altitude (length 17), latitude (length 73), and longitude (length 144). I only need certain portions of that information at a time. Specifically, I need all of the time, but only one altitude level, and only a particular region of latitude and longitude (20x44).
I had code that gathered all of this data from both files, identified only the data I needed, performed calculations, and output the data into a text file. Once done with that year, it looped through 63 years of data, which is 126 files of equivalent size. Now, the code says it runs out of memory right at the beginning of the process. The relevant code seems to be:
from mpl_toolkits.basemap.pupynere import NetCDFFile
#Create the file name for the input data.
ufile="Flow/uwnd."+str(time)+".nc"
vfile="Flow/vwnd."+str(time)+".nc"
#Get the data from that particular file.
uu=NetCDFFile(ufile)
vv=NetCDFFile(vfile)
#Save the values into an array (will be 4-dimentional)
uwnd_short=uu.variables['uwnd'][:]
vwnd_short=vv.variables['vwnd'][:]
So, the first section creates the name of the NetCDF files. The second section gets all the data from the NetCDF files. The third section takes the imported data and places it into 4D arrays. (This may not technically be an array because of how Python works with the data, but I have thought of it as such due to my C++ background. Apologies for lack of proper vocabulary.) Later on, I separate out the specific data I need from the 4D array and perform necessary calculations. The trouble is that this used to work, but now my computer runs out of memory while working on the vv=NetCDFFile(vfile) line.
Is there a possible memory leak somewhere? Is there a way to only get the specific range of data I need so I'm not bringing in the entire file? Is there a more efficient way to go from bringing the data in to sorting out the section of data I need to performing calculations with it?
[[[[ 4.10000610e+00 4.50001526e+00 4.80000305e+00 ..., 2.90000916e+00 3.30000305e+00 3.70001221e+00] [ 3.00001526e+00 3.50001526e+00 3.90000916e+00 ..., 1.60000610e+00 2.10000610e+00 2.50001526e+00] [ -9.99984741e-01 -6.99996948e-01 -3.99993896e-01 ..., -1.49998474e+00 -1.39999390e+00 -1.19999695e+00] ...,The numbers of course continue and I later use a scale and offset.