How can I write a csv file with multiple header lines with pandas to_csv()?

Question

Consider a data frame with a date column as an index and three columns x, y and z with some observations. I want to write the contents of this data frame to a .csv file. I know I can use df.to_csv for this, however, I would like to add a second header line with the units. In this example, the desired .csv file would look something like this:

date,x,y,z  
(yyyy-mm-dd),(s),(m),(kg)  
2014-03-12,1,2,3  
2014-03-13,4,5,6  
...

Maybe you could just write the first line using normal python output (file.write()),and then write the data frame with the units line as the header under that. (Not sure if this works or not, but maybe a way to do it) — Yilun Zhang
– Yilun Zhang, Commented Mar 12, 2014 at 15:52
How is that different from inserting a new row with your "second" header at the beginning? — Ben
– Ben, Commented Mar 12, 2014 at 15:52
@Ben: How can I do that with a string for the index (keep in mind I have a datetime-index)? I tried using df.loc(), but apparently I get the syntax wrong (I always get the error ValueError: unsafe appending to index of type DatetimeIndex with a key yyyy-mm-dd). — Fred S
– Fred S, Commented Mar 12, 2014 at 16:10

Community · Accepted Answer · 2017-05-23 12:09:30Z

This doesn't produce the exact output in your example, but it's close. You can use multi-index columns to store the second header (the units) with the column labels:

>>> import pandas as pd
>>> columns = pd.MultiIndex.from_tuples(
...     zip(['date', 'x', 'y', 'z'],
...         ['(yyyy-mm-dd)', '(s)', '(m)', '(kg)']))
>>> data = [['2014-03-12', 1, 2, 3],
...         ['2014-03-13', 4, 5, 6]]
>>> df = pd.DataFrame(data, columns=columns)
>>> df
          date   x   y    z
  (yyyy-mm-dd) (s) (m) (kg)
0   2014-03-12   1   2    3
1   2014-03-13   4   5    6

Storing the second header this way allows your columns to keep the correct type (e.g., column x should be an integer type):

>>> df.dtypes
date  (yyyy-mm-dd)    object
x     (s)              int64
y     (m)              int64
z     (kg)             int64
dtype: object

If you had stored the second header as a row in the DataFrame, your column dtypes would become object, which you probably don't want.

Writing the DataFrame in CSV format produces something very similar to your example:

>>> df.to_csv('out.csv', index=False)
>>> !cat out.csv
date,x,y,z
(yyyy-mm-dd),(s),(m),(kg)
,,,
2014-03-12,1,2,3
2014-03-13,4,5,6

The only difference is the extra line of commas, which is how pandas separates multi-row headers from the actual rows of data. This allows the CSV file to be read back into an equivalent DataFrame:

>>> df2 = pd.read_csv('out.csv', header=[0, 1])
>>> df2
          date   x   y    z
  (yyyy-mm-dd) (s) (m) (kg)
0   2014-03-12   1   2    3
1   2014-03-13   4   5    6

Note: I found a lot of this information scattered throughout this SO question.

Collectives™ on Stack Overflow

How can I write a csv file with multiple header lines with pandas to_csv()?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related