5

I have a pandas dataframe/csv of the form

date        Country   Type     Val
2013-01-01  USA        x        23
2013-01-01  USA        y        13
2013-01-01  MX         x        11
2013-01-01  MX         y        14  
2013-01-02  USA        x        20
2013-01-02  USA        y        19
2013-01-02  MX         x        14
2013-01-02  MX         y        16

I want to convert this to a form

date       Country     x   y 
2013-01-01  USA        23  13
2013-01-01  MX         11  14
2013-01-02  USA        20  19
2013-01-02  MX         14  16

In general I am looking for a way to transform a table using unique values of a single column.

I have looked at pivot and groupby but didn't get the exact form.

HINT: possibly this is solvable by pivot but I haven't been able to get the form

3 Answers 3

10

Probably not the most elegant way possible, but using unstack:

>>> df
         date Country Type  Val
0  2013-01-01     USA    x   23
1  2013-01-01     USA    y   13
2  2013-01-01      MX    x   11
3  2013-01-01      MX    y   14
4  2013-01-02     USA    x   20
5  2013-01-02     USA    y   19
6  2013-01-02      MX    x   14
7  2013-01-02      MX    y   16

>>> df.set_index(['date', 'Country', 'Type']).unstack('Type').reset_index()
            date Country  Val
Type                        x   y
0     2013-01-01      MX   11  14
1     2013-01-01     USA   23  13
2     2013-01-02      MX   14  16
3     2013-01-02     USA   20  19

A little more generally, and removing the strange hierarchical columns in the result:

>>> cols = [c for c in df.columns if c not in {'Type', 'Val'}]
>>> df2 = df.set_index(cols + ['Type']).unstack('Type')
>>> df2
                    Val
Type                  x   y
date       Country
2013-01-01 MX        11  14
           USA       23  13
2013-01-02 MX        14  16
           USA       20  19
>>> df2.columns = df2.columns.levels[1]
>>> df2.columns.name = None
>>> df2
                     x   y
date       Country
2013-01-01 MX       11  14
           USA      23  13
2013-01-02 MX       14  16
           USA      20  19
>>> df2.reset_index()
         date Country   x   y
0  2013-01-01      MX  11  14
1  2013-01-01     USA  23  13
2  2013-01-02      MX  14  16
3  2013-01-02     USA  20  19
Sign up to request clarification or add additional context in comments.

3 Comments

wow.. thanks for the quick response.. I am guessing a mental block but is there a way I can remove the index name
If you mean Type there, it's not actually df.index.name but instead df.columns is hierarchical and has the name Type. I edited in how to get rid of that.
Thanks figured it out... forgot to edit.. but it seems efficient enough.. accepting the answer :)
4

I cooked up my own pivot based solution to the same problem before finding Dougal's answer, thought I would post it for posterity since I find it more readable:

>>> pd.__version__
'0.15.0'
>>> df
         date Country Type  Val
0  2013-01-01     USA    x   23
1  2013-01-01     USA    y   13
2  2013-01-01      MX    x   11
3  2013-01-01      MX    y   14
4  2013-01-02     USA    x   20
5  2013-01-02     USA    y   19
6  2013-01-02      MX    x   14
7  2013-01-02      MX    y   16
>>> pt=df.pivot_table(values='Val',
...                   columns='Type',
...                   index=['date','Country'],
...                   )
>>> pt
Type                 x   y
date       Country        
2013-01-01 MX       11  14
           USA      23  13
2013-01-02 MX       14  16
           USA      20  19

And then carry on with Dougal's cleanups:

>>> pt.columns.name=None
>>> pt.reset_index()
         date Country   x   y
0  2013-01-01      MX  11  14
1  2013-01-01     USA  23  13
2  2013-01-02      MX  14  16
3  2013-01-02     USA  20  19

Note that DataFrame.to_csv() produces your requested output:

>>> print(pt.to_csv())
date,Country,x,y
2013-01-01,MX,11,14
2013-01-01,USA,23,13
2013-01-02,MX,14,16
2013-01-02,USA,20,19

1 Comment

Thanks for the alternate solution!
4

Let's store your original dataframe in df Then at least in version 0.18.1 you can do:

df.pivot_table(values="Val", index=['date', 'Country'], columns='Type')

gives the right answer:

Type                 x   y
date       Country
2013-01-01 MX       11  14
           USA      23  13
2013-01-02 MX       14  16
           USA      20  19

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.