I am loading an rdx (csv-like format) file of around 16GB as a pandas dataframe and then I cut it down by removing some lines. Here's the code:
import pandas as pd
t_min, t_max, n_min, n_max, c_min, c_max = raw_input('t_min, t_max, n_min, n_max, c_min, c_max: ').split(' ')
data=pd.read_csv('/Users/me/Desktop/foo.rdx',header=None)
new_data=data.loc[(data[0] >= float(t_min)) & (data[0] <= float(t_max)) & (data[1] >= float(n_min)) & (data[1] <= float(n_max)) & (data[2] >= float(c_min)) & (data[2] <= float(c_max))]
This code works for smaller files (~5GB), but it appears that it cannot load a file of this size. Is there a workaround to this? Or maybe a bash script way to do this?
Any help or suggestion is greatly appreciated.