3

I have a 100GB csv file with millions of rows. I need to read, say, 10,000 rows at a time in pandas dataframe and write that to the SQL server in chunks.

I used chunksize as well as iteartor as suggested on http://pandas-docs.github.io/pandas-docs-travis/io.html#iterating-through-files-chunk-by-chunk, and have gone through many similar questions,but I am still getting the out of memory error.

Can you suggest a code to read very big csv files in pandas dataframe iteratively?

1 Answer 1

4

Demo:

for chunk in pd.read_csv(filename, chunksize=10**5):
    chunk.to_sql('table_name', conn, if_exists='append')

where conn is a SQLAlchemy engine (created by sqlalchemy.create_engine(...))

Sign up to request clarification or add additional context in comments.

2 Comments

Wow...this turned out to be a much more elegant solution for the problem I have been grappling with for quite some time now! Thanks!
@Geet, glad i could help... :) Thanks for accepting the answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.