WRITE only first N rows from pandas df to csv

Question

How can I write only first N rows or from P to Q rows to csv from pandas dataframe without subseting the df first? I cannot subset the data I want to export because of memory issues.

I am thinking of a function which writes to csv row by row.

Thank you

Can't because of memory issues. I could go over details but I doubt we will find a solution there. I am asking if it is available to the to_csv mehtod although I could not find it in the doc. Or just a function that writes directly row by row to the csv — criticalth
– criticalth, Commented Aug 12, 2019 at 9:37
chunksize refers to how many rows are exported at a time... still the whole csv gets exported — criticalth
– criticalth, Commented Aug 12, 2019 at 9:44
FYI, each of the below seem to work in copying the file contents over, but they also seem to add an additional column seemingly corresponding to the index. Highly suggest removing index when moving dataframe to csv (just add argument so it is.to_csv(index=False)) — Sachin Raghavendran
– Sachin Raghavendran, Commented May 13, 2021 at 5:52

bharatk · Accepted Answer · 2019-08-12 09:41:11Z

14

Use head- Return the first n rows.

Ex.

import pandas as pd
import numpy as np
date = pd.date_range('20190101',periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=date, columns=list('ABCD'))

#wtire only top two rows into csv file
print(df.head(2).to_csv("test.csv"))

answered Aug 12, 2019 at 9:41

bharatk

4,3455 gold badges19 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

criticalth Over a year ago

not what I am asking since you are slicing the df beforehand. I tried that it gives me memory error because I guess pandas makes a copy..

bharatk Over a year ago

@criticalth I don't know whether pandas head() function make a copied or not. I read pandas document, it uses for a return first N rows of dataframe.

Matthias Fripp · Accepted Answer · 2020-02-07 04:55:51Z

4

Does this work for you?

df.iloc[:N, :].to_csv()

Or

df.iloc[P:Q, :].to_csv()

I believe df.iloc generally produces references to the original dataframe rather than copying the data.

If this still doesn't work, you might also try setting the chunksize in the to_csv call. It may be that pandas is able to create the subset without using much more memory, but then it makes a complete copy of the rows written to each chunk. If the chunksize is the whole frame, you would end up copying the whole frame at that point and running out of memory.

If all else fails, you can loop through df.iterrows() or df.iloc[P:Q, :].iterrows() or df.iloc[P:Q, :].itertuples() and write each row using the csv module (possibly writer.writerows(df.iloc[P:Q, :].itertuples()).

edited Feb 7, 2020 at 4:55

answered Aug 12, 2019 at 10:12

Matthias Fripp

18.9k5 gold badges36 silver badges49 bronze badges

4 Comments

criticalth Over a year ago

still gives me a MemoryError, I upvoted since it is a usefull information

Matthias Fripp Over a year ago

Could you try breaking it into two steps and see which one gives the memory error? I.e., df2 = df.iloc[:N, :] then df2.to_csv(..., chunksize=100)

criticalth Over a year ago

I did. It is the df.iloc[:N, :], I tried for smaller Ns than needed and it worked but it is ugly this way. I also built a program that reads row by row and uses almost no ram but it is slow so I will do something in the middle. It is a hassle that to_csv method does not have nrows..

Matthias Fripp Over a year ago

According to this answer, df.iloc[:N, :] should create a view, not a copy. Nevertheless, in your case, it sounds like Pandas is trying to make a copy and then running out of memory. So you probably have to iterate over the desired section of the data frame and write the .csv file yourself.

M-M · Accepted Answer · 2019-08-12 09:55:10Z

2

Maybe you can select the rows index that you want to write on your CSV file like this:

df[df.index.isin([1, 2, ...])].to_csv('file.csv')

Or use this one:

df.loc[n:n].to_csv('file.csv')

edited Aug 12, 2019 at 9:55

answered Aug 12, 2019 at 9:49

M-M

4502 silver badges16 bronze badges

Collectives™ on Stack Overflow

WRITE only first N rows from pandas df to csv

3 Answers 3

2 Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related