pandas dataframe with 2-rows header and export to csv

Question

I have a dataframe

df = pd.DataFrame(columns = ["AA", "BB", "CC"])
df.loc[0]= ["a", "b", "c1"]
df.loc[1]= ["a", "b", "c2"]
df.loc[2]= ["a", "b", "c3"]

I need to add secod row to header

df.columns = pd.MultiIndex.from_tuples(zip(df.columns, ["DD", "EE", "FF"]))

my df is now

  AA BB  CC
  DD EE  FF
0  a  b  c1
1  a  b  c2
2  a  b  c3

but when I write this dataframe to csv file

df.to_csv("test.csv", index = False)

I get one more row than expected

AA,BB,CC
DD,EE,FF
,,
a,b,c1
a,b,c2
a,b,c3

This definitely looks like a bug, recommending posting this as a github issue. — Andy Hayden
– Andy Hayden, Commented Jun 23, 2014 at 19:27
any workarround how to get the expected format without this extra line? — Meloun
– Meloun, Commented Jun 23, 2014 at 19:36
Late to the party, I know. But I was searching for a fix to the same issue. Pandas 0.19.0 and above has this issue fixed — BoffWx
– BoffWx, Commented Jul 27, 2017 at 9:32

bluu · Accepted Answer · 2017-08-08 06:56:52Z

8

It's an ugly hack, but if you needed something to work Right Now(tm), you could write it out in two parts:

>>> pd.DataFrame(df.columns.tolist()).T.to_csv("noblankrows.csv", mode="w", header=False, index=False)
>>> df.to_csv("noblankrows.csv", mode="a", header=False, index=False)
>>> !cat noblankrows.csv
AA,BB,CC
DD,EE,FF
a,b,c1
a,b,c2
a,b,c3

edited Aug 8, 2017 at 6:56

bluu

5524 silver badges13 bronze badges

answered Jun 23, 2014 at 19:45

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Andy Hayden Over a year ago

lol, snap! Though this is a neater way of writing out the header!

Spike Williams Over a year ago

Be careful... I tried this, and it re-ordered the headers into alphabetic order, which were then out of alignment with the column values.

Community · Accepted Answer · 2020-06-20 09:12:55Z

4

I think this is a bug in to_csv. If you're looking for workarounds then here's a couple.

To read back in this csv specify the header rows*:

In [11]: csv = "AA,BB,CC
DD,EE,FF
,,
a,b,c1
a,b,c2
a,b,c3"

In [12]: pd.read_csv(StringIO(csv), header=[0, 1])
Out[12]:
  AA BB  CC
  DD EE  FF
0  a  b  c1
1  a  b  c2
2  a  b  c3

*strangely this seems to ignore the blank lines.

To write out you could write the header first and then append:

with open('test.csv', 'w') as f:
    f.write('\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n')
df.to_csv('test.csv', mode='a', index=False, header=False)

Note the to_csv part for MultiIndex column here:

In [21]: '\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n'
Out[21]: 'AA,BB,CC\nDD,EE,FF\n'

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jun 23, 2014 at 19:45

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

3 Comments

Jeff Over a year ago

not a bug, this is the defined format, you can specify tupleize_cols=True to make it write a multi-index header as a single row.

Andy Hayden Over a year ago

@Jeff this isn't about making it as a single row: Try without tupleize_cols, it adds the ,,,, line to the csv (a bug??).

Jeff Over a year ago

the names are None, but it still HAS names. not a bug. In order to have an exact reproduction is HAS to have the line. the reader happens to be able to read either format. Their is an open issue to NOT print the empty line which is a stylistic issue. the reader is robust to this. not specifying the header in a multi-index columns is a USER error. not a bug.

CT Zhu · Accepted Answer · 2014-06-23 19:47:08Z

3

Use df.to_csv("test.csv", index = False, tupleize_cols=True) to get the resulting CSV to be:

"('AA', 'DD')","('BB', 'EE')","('CC', 'FF')"
a,b,c1
a,b,c2
a,b,c3

To read it back:

df2=pd.read_csv("test.csv", tupleize_cols=True)
df2.columns=pd.MultiIndex.from_tuples(eval(','.join(df2.columns)))

To get the exact output you wanted:

with open('test.csv', 'a') as f:
    pd.DataFrame(np.asanyarray(df.columns.tolist())).T.to_csv(f, index = False, header=False)
    df.to_csv(f, index = False, header=False)

edited Jun 23, 2014 at 19:47

answered Jun 23, 2014 at 19:03

CT Zhu

54.6k18 gold badges125 silver badges136 bronze badges

4 Comments

CT Zhu Over a year ago

That would not be a good to way to write to a CSV anyway because you will also have a hard time read it back. See edit.

CT Zhu Over a year ago

Yeap, you will get the same df, if thats what you are asking. See edit

Meloun Over a year ago

sorry, but I am not satisfied with that.. I need really the output as described because it's an input for other application, there is no pandas reading back..

CT Zhu Over a year ago

See edit. You can do it in two steps, write the header, then the body.

bluu · Accepted Answer · 2017-08-09 05:01:32Z

Building on top of @DSM's solution:

if you need (as I did) to apply the same hack to an export to excel, the main change needed (apart from expected differences with the to_excel method) is to actually remove the multiindex used for your column labels...

That's because .to_excel doesn't support writing out a df having a multiindex for columns but no index (providing index=False to the .to_excel method) contrarily to .to_csv

Anyway, here's what it would look like:

>>> writer = pd.ExcelWriter("noblankrows.xlsx")
>>> headers = pd.DataFrame(df.columns.tolist()).T
>>> headers.to_excel(
        writer, header=False, index=False)
>>> df.columns = pd.Index(range(len(df.columns)))  # that's what I was referring to...
>>> df.to_excel(
        writer, header=False, index=False, startrow=len(headers))
>>> writer.save()
>>> pd.read_excel("noblankrows.xlsx").to_csv(sys.stdout, index=False)
AA,BB,CC
DD,EE,FF
a,b,c1
a,b,c2
a,b,c3

Collectives™ on Stack Overflow

pandas dataframe with 2-rows header and export to csv

4 Answers 4

2 Comments

To read back in this csv specify the header rows*:

To write out you could write the header first and then append:

3 Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

To read back in this csv specify the header rows*:

To write out you could write the header first and then append:

3 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related