How to resolve memory issue of pandas while reading big csv files

Question

I have a 100GB csv file with millions of rows. I need to read, say, 10,000 rows at a time in pandas dataframe and write that to the SQL server in chunks.

I used chunksize as well as iteartor as suggested on http://pandas-docs.github.io/pandas-docs-travis/io.html#iterating-through-files-chunk-by-chunk, and have gone through many similar questions,but I am still getting the out of memory error.

Can you suggest a code to read very big csv files in pandas dataframe iteratively?

MaxU - stand with Ukraine · Accepted Answer · 2016-09-08 19:57:44Z

4

Demo:

for chunk in pd.read_csv(filename, chunksize=10**5):
    chunk.to_sql('table_name', conn, if_exists='append')

where conn is a SQLAlchemy engine (created by sqlalchemy.create_engine(...))

answered Sep 8, 2016 at 19:57

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Geet Over a year ago

Wow...this turned out to be a much more elegant solution for the problem I have been grappling with for quite some time now! Thanks!

MaxU - stand with Ukraine Over a year ago

@Geet, glad i could help... :) Thanks for accepting the answer!

Collectives™ on Stack Overflow

How to resolve memory issue of pandas while reading big csv files

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related