4

Consider a data frame with a date column as an index and three columns x, y and z with some observations. I want to write the contents of this data frame to a .csv file. I know I can use df.to_csv for this, however, I would like to add a second header line with the units. In this example, the desired .csv file would look something like this:

date,x,y,z  
(yyyy-mm-dd),(s),(m),(kg)  
2014-03-12,1,2,3  
2014-03-13,4,5,6  
...
3
  • Maybe you could just write the first line using normal python output (file.write()),and then write the data frame with the units line as the header under that. (Not sure if this works or not, but maybe a way to do it) Commented Mar 12, 2014 at 15:52
  • How is that different from inserting a new row with your "second" header at the beginning? Commented Mar 12, 2014 at 15:52
  • @Ben: How can I do that with a string for the index (keep in mind I have a datetime-index)? I tried using df.loc(), but apparently I get the syntax wrong (I always get the error ValueError: unsafe appending to index of type DatetimeIndex with a key yyyy-mm-dd). Commented Mar 12, 2014 at 16:10

1 Answer 1

4

This doesn't produce the exact output in your example, but it's close. You can use multi-index columns to store the second header (the units) with the column labels:

>>> import pandas as pd
>>> columns = pd.MultiIndex.from_tuples(
...     zip(['date', 'x', 'y', 'z'],
...         ['(yyyy-mm-dd)', '(s)', '(m)', '(kg)']))
>>> data = [['2014-03-12', 1, 2, 3],
...         ['2014-03-13', 4, 5, 6]]
>>> df = pd.DataFrame(data, columns=columns)
>>> df
          date   x   y    z
  (yyyy-mm-dd) (s) (m) (kg)
0   2014-03-12   1   2    3
1   2014-03-13   4   5    6

Storing the second header this way allows your columns to keep the correct type (e.g., column x should be an integer type):

>>> df.dtypes
date  (yyyy-mm-dd)    object
x     (s)              int64
y     (m)              int64
z     (kg)             int64
dtype: object

If you had stored the second header as a row in the DataFrame, your column dtypes would become object, which you probably don't want.

Writing the DataFrame in CSV format produces something very similar to your example:

>>> df.to_csv('out.csv', index=False)
>>> !cat out.csv
date,x,y,z
(yyyy-mm-dd),(s),(m),(kg)
,,,
2014-03-12,1,2,3
2014-03-13,4,5,6

The only difference is the extra line of commas, which is how pandas separates multi-row headers from the actual rows of data. This allows the CSV file to be read back into an equivalent DataFrame:

>>> df2 = pd.read_csv('out.csv', header=[0, 1])
>>> df2
          date   x   y    z
  (yyyy-mm-dd) (s) (m) (kg)
0   2014-03-12   1   2    3
1   2014-03-13   4   5    6

Note: I found a lot of this information scattered throughout this SO question.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.