0

My machine was laggy while trying to read a 4GB of csv in jupyter notebook with chunksize option: raw = pd.read_csv(csv_path, chunksize=10**6) data = pd.concat(raw, ignore_index=True) This takes forever to run and also freeze my machine (Ubuntu 16.04 with 16GB of RAM). What is the right way to do this?

1 Answer 1

2

The point of using chunk is that you don't need the whole dataset in memory at one time and you can process each chunk when you read the file. Assuming you don't need the whole dataset in memory at one time, you can do

chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
   do_something(chunk)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.