I have a large amount of data in NetCDF4 files, and I am trying to write a script to dynamically chunk this data to hold as much in memory as possible, do calculations on it and save the results, then move on to the next chunk.
An example of what I am trying to do. Say I have an array like this:
import numpy as np
arr = np.random.randint(0, 10, (100, 15, 51)) # Call these x, y, and z coordinates
And I only want to read ten of the x coordinates at a time, like this:
placeholder = 0
for i in range(10, 101, 10):
tmp_array = arr[placeholder:i, :, :]
# Do calculations here and save results to file or database
placeholder += 10
Is there some sort of built-in method for this? In this simple example it works pretty well, but as things get more complicated this seems like it could get to be a headache for me to manage all of this myself. I am aware of Dask, but it is unhelpful to me in this situation because I am not doing array operations with the data. Although Dask could be useful to me if it had methods to deal with this too.