10

I have a dataframe

df = pd.DataFrame(columns = ["AA", "BB", "CC"])
df.loc[0]= ["a", "b", "c1"]
df.loc[1]= ["a", "b", "c2"]
df.loc[2]= ["a", "b", "c3"]

I need to add secod row to header

df.columns = pd.MultiIndex.from_tuples(zip(df.columns, ["DD", "EE", "FF"]))

my df is now

  AA BB  CC
  DD EE  FF
0  a  b  c1
1  a  b  c2
2  a  b  c3

but when I write this dataframe to csv file

df.to_csv("test.csv", index = False)

I get one more row than expected

AA,BB,CC
DD,EE,FF
,,
a,b,c1
a,b,c2
a,b,c3
3
  • This definitely looks like a bug, recommending posting this as a github issue. Commented Jun 23, 2014 at 19:27
  • any workarround how to get the expected format without this extra line? Commented Jun 23, 2014 at 19:36
  • Late to the party, I know. But I was searching for a fix to the same issue. Pandas 0.19.0 and above has this issue fixed Commented Jul 27, 2017 at 9:32

4 Answers 4

8

It's an ugly hack, but if you needed something to work Right Now(tm), you could write it out in two parts:

>>> pd.DataFrame(df.columns.tolist()).T.to_csv("noblankrows.csv", mode="w", header=False, index=False)
>>> df.to_csv("noblankrows.csv", mode="a", header=False, index=False)
>>> !cat noblankrows.csv
AA,BB,CC
DD,EE,FF
a,b,c1
a,b,c2
a,b,c3
Sign up to request clarification or add additional context in comments.

2 Comments

lol, snap! Though this is a neater way of writing out the header!
Be careful... I tried this, and it re-ordered the headers into alphabetic order, which were then out of alignment with the column values.
4

I think this is a bug in to_csv. If you're looking for workarounds then here's a couple.

To read back in this csv specify the header rows*:

In [11]: csv = "AA,BB,CC
DD,EE,FF
,,
a,b,c1
a,b,c2
a,b,c3"

In [12]: pd.read_csv(StringIO(csv), header=[0, 1])
Out[12]:
  AA BB  CC
  DD EE  FF
0  a  b  c1
1  a  b  c2
2  a  b  c3

*strangely this seems to ignore the blank lines.

To write out you could write the header first and then append:

with open('test.csv', 'w') as f:
    f.write('\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n')
df.to_csv('test.csv', mode='a', index=False, header=False)

Note the to_csv part for MultiIndex column here:

In [21]: '\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n'
Out[21]: 'AA,BB,CC\nDD,EE,FF\n'

3 Comments

not a bug, this is the defined format, you can specify tupleize_cols=True to make it write a multi-index header as a single row.
@Jeff this isn't about making it as a single row: Try without tupleize_cols, it adds the ,,,, line to the csv (a bug??).
the names are None, but it still HAS names. not a bug. In order to have an exact reproduction is HAS to have the line. the reader happens to be able to read either format. Their is an open issue to NOT print the empty line which is a stylistic issue. the reader is robust to this. not specifying the header in a multi-index columns is a USER error. not a bug.
3

Use df.to_csv("test.csv", index = False, tupleize_cols=True) to get the resulting CSV to be:

"('AA', 'DD')","('BB', 'EE')","('CC', 'FF')"
a,b,c1
a,b,c2
a,b,c3

To read it back:

df2=pd.read_csv("test.csv", tupleize_cols=True)
df2.columns=pd.MultiIndex.from_tuples(eval(','.join(df2.columns)))

To get the exact output you wanted:

with open('test.csv', 'a') as f:
    pd.DataFrame(np.asanyarray(df.columns.tolist())).T.to_csv(f, index = False, header=False)
    df.to_csv(f, index = False, header=False)

4 Comments

That would not be a good to way to write to a CSV anyway because you will also have a hard time read it back. See edit.
Yeap, you will get the same df, if thats what you are asking. See edit
sorry, but I am not satisfied with that.. I need really the output as described because it's an input for other application, there is no pandas reading back..
See edit. You can do it in two steps, write the header, then the body.
2

Building on top of @DSM's solution:

if you need (as I did) to apply the same hack to an export to excel, the main change needed (apart from expected differences with the to_excel method) is to actually remove the multiindex used for your column labels...

That's because .to_excel doesn't support writing out a df having a multiindex for columns but no index (providing index=False to the .to_excel method) contrarily to .to_csv

Anyway, here's what it would look like:

>>> writer = pd.ExcelWriter("noblankrows.xlsx")
>>> headers = pd.DataFrame(df.columns.tolist()).T
>>> headers.to_excel(
        writer, header=False, index=False)
>>> df.columns = pd.Index(range(len(df.columns)))  # that's what I was referring to...
>>> df.to_excel(
        writer, header=False, index=False, startrow=len(headers))
>>> writer.save()
>>> pd.read_excel("noblankrows.xlsx").to_csv(sys.stdout, index=False)
AA,BB,CC
DD,EE,FF
a,b,c1
a,b,c2
a,b,c3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.