9

How can I write only first N rows or from P to Q rows to csv from pandas dataframe without subseting the df first? I cannot subset the data I want to export because of memory issues.

I am thinking of a function which writes to csv row by row.

Thank you

5
  • just slice the dataframe? Commented Aug 12, 2019 at 9:35
  • Can't because of memory issues. I could go over details but I doubt we will find a solution there. I am asking if it is available to the to_csv mehtod although I could not find it in the doc. Or just a function that writes directly row by row to the csv Commented Aug 12, 2019 at 9:37
  • Look at the chunksize parameter of the .to_csv() method Commented Aug 12, 2019 at 9:40
  • chunksize refers to how many rows are exported at a time... still the whole csv gets exported Commented Aug 12, 2019 at 9:44
  • FYI, each of the below seem to work in copying the file contents over, but they also seem to add an additional column seemingly corresponding to the index. Highly suggest removing index when moving dataframe to csv (just add argument so it is.to_csv(index=False)) Commented May 13, 2021 at 5:52

3 Answers 3

14
  • Use head- Return the first n rows.

Ex.

import pandas as pd
import numpy as np
date = pd.date_range('20190101',periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=date, columns=list('ABCD'))

#wtire only top two rows into csv file
print(df.head(2).to_csv("test.csv"))
Sign up to request clarification or add additional context in comments.

2 Comments

not what I am asking since you are slicing the df beforehand. I tried that it gives me memory error because I guess pandas makes a copy..
@criticalth I don't know whether pandas head() function make a copied or not. I read pandas document, it uses for a return first N rows of dataframe.
4

Does this work for you?

df.iloc[:N, :].to_csv()

Or

df.iloc[P:Q, :].to_csv()

I believe df.iloc generally produces references to the original dataframe rather than copying the data.

If this still doesn't work, you might also try setting the chunksize in the to_csv call. It may be that pandas is able to create the subset without using much more memory, but then it makes a complete copy of the rows written to each chunk. If the chunksize is the whole frame, you would end up copying the whole frame at that point and running out of memory.

If all else fails, you can loop through df.iterrows() or df.iloc[P:Q, :].iterrows() or df.iloc[P:Q, :].itertuples() and write each row using the csv module (possibly writer.writerows(df.iloc[P:Q, :].itertuples()).

4 Comments

still gives me a MemoryError, I upvoted since it is a usefull information
Could you try breaking it into two steps and see which one gives the memory error? I.e., df2 = df.iloc[:N, :] then df2.to_csv(..., chunksize=100)
I did. It is the df.iloc[:N, :], I tried for smaller Ns than needed and it worked but it is ugly this way. I also built a program that reads row by row and uses almost no ram but it is slow so I will do something in the middle. It is a hassle that to_csv method does not have nrows..
According to this answer, df.iloc[:N, :] should create a view, not a copy. Nevertheless, in your case, it sounds like Pandas is trying to make a copy and then running out of memory. So you probably have to iterate over the desired section of the data frame and write the .csv file yourself.
2

Maybe you can select the rows index that you want to write on your CSV file like this:

df[df.index.isin([1, 2, ...])].to_csv('file.csv')

Or use this one:

df.loc[n:n].to_csv('file.csv')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.