I am downloading some data from the db, storing in a numpy array, and performing some clean up on the array based on the contents of a particular column. This is my code that I am using to delete some rows :
def clean_data(data,column):
target_data = data[:,column].astype(int)
pos_to_delete = np.where(target_data==1)[0]
data = np.delete(data,pos_to_delete,axis=0)
return data
I get the following error in numpy.
Traceback (most recent call last):
File "data_download.py", line 111, in download_data
data = clean_data(data)
File "/home/work/data_clean.py", line 13, in data_clean.py
data = np.delete(data,pos_to_delete,axis=0)
File "/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py", line 4262, in delete
new = arr[tuple(slobj)]
MemoryError
PS - If I get data from the db and dunp to a file, then read this file and perform clean up, this error does'nt show anymore. Solutions to this question Is there any way to delete the specific elements of an numpy array "In-place" in python: are'nt helping. How do I delete with inplace=True and also take care of the Memory issue? Can anyone please help? Thanks in advance.
deletecreates the array that will return the result. It then intends to fill it with the 'keeper' values from the source.deletealways returns a new array. Looks like other objects such as the source DataFrame are taking up a lot of memory, leaving little memory for further manipulation.target_dataandpos_to_delete?