How to read a large csv with pandas?

Question

I am loading an rdx (csv-like format) file of around 16GB as a pandas dataframe and then I cut it down by removing some lines. Here's the code:

import pandas as pd

t_min, t_max, n_min, n_max, c_min, c_max = raw_input('t_min, t_max, n_min, n_max, c_min, c_max: ').split(' ')

data=pd.read_csv('/Users/me/Desktop/foo.rdx',header=None)

new_data=data.loc[(data[0] >= float(t_min)) & (data[0] <= float(t_max)) & (data[1] >= float(n_min)) & (data[1] <= float(n_max)) & (data[2] >= float(c_min)) & (data[2] <= float(c_max))]

This code works for smaller files (~5GB), but it appears that it cannot load a file of this size. Is there a workaround to this? Or maybe a bash script way to do this?

Any help or suggestion is greatly appreciated.

George · Accepted Answer · 2019-03-31 09:38:00Z

4

Try to use the chunksize parameter, filter in chunks and then concat

t_min, t_max, n_min, n_max, c_min, c_max = map(float, raw_input('t_min, t_max, n_min, n_max, c_min, c_max: ').split())

num_of_rows = 1024
TextFileReader = pd.read_csv(path, header=None, chunksize=num_of_rows)

dfs = []
for chunk_df in TextFileReader:
    dfs.append(chunk_df.loc[(chunk_df[0] >= t_min) & (chunk_df[0] <= t_max) & (chunk_df[1] >= n_min) & (chunk_df[1] <= n_max) & (chunk_df[2] >= c_min) & (chunk_df[2] <= c_max)])

df = pd.concat(dfs,sort=False)

edited Mar 31, 2019 at 9:38

George

4511 gold badge6 silver badges18 bronze badges

answered Mar 31, 2019 at 7:38

Uri Goren

13.8k8 gold badges62 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

George Over a year ago

Yup that works. Thanks. Any reason as to why num_of_rows = 1024 and not 1K or 1M for example? Will it go faster or slower if I increase the chunksize?

Uri Goren Over a year ago

No, you can set it according to your machine's limitations

Collectives™ on Stack Overflow

How to read a large csv with pandas?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related