Pandas DataFrame: transforming frame using unique values of a column

Question

I have a pandas dataframe/csv of the form

date        Country   Type     Val
2013-01-01  USA        x        23
2013-01-01  USA        y        13
2013-01-01  MX         x        11
2013-01-01  MX         y        14  
2013-01-02  USA        x        20
2013-01-02  USA        y        19
2013-01-02  MX         x        14
2013-01-02  MX         y        16

I want to convert this to a form

date       Country     x   y 
2013-01-01  USA        23  13
2013-01-01  MX         11  14
2013-01-02  USA        20  19
2013-01-02  MX         14  16

In general I am looking for a way to transform a table using unique values of a single column.

I have looked at pivot and groupby but didn't get the exact form.

HINT: possibly this is solvable by pivot but I haven't been able to get the form

Danica · Accepted Answer · 2013-06-27 03:49:27Z

10

Probably not the most elegant way possible, but using unstack:

>>> df
         date Country Type  Val
0  2013-01-01     USA    x   23
1  2013-01-01     USA    y   13
2  2013-01-01      MX    x   11
3  2013-01-01      MX    y   14
4  2013-01-02     USA    x   20
5  2013-01-02     USA    y   19
6  2013-01-02      MX    x   14
7  2013-01-02      MX    y   16

>>> df.set_index(['date', 'Country', 'Type']).unstack('Type').reset_index()
            date Country  Val
Type                        x   y
0     2013-01-01      MX   11  14
1     2013-01-01     USA   23  13
2     2013-01-02      MX   14  16
3     2013-01-02     USA   20  19

A little more generally, and removing the strange hierarchical columns in the result:

>>> cols = [c for c in df.columns if c not in {'Type', 'Val'}]
>>> df2 = df.set_index(cols + ['Type']).unstack('Type')
>>> df2
                    Val
Type                  x   y
date       Country
2013-01-01 MX        11  14
           USA       23  13
2013-01-02 MX        14  16
           USA       20  19
>>> df2.columns = df2.columns.levels[1]
>>> df2.columns.name = None
>>> df2
                     x   y
date       Country
2013-01-01 MX       11  14
           USA      23  13
2013-01-02 MX       14  16
           USA      20  19
>>> df2.reset_index()
         date Country   x   y
0  2013-01-01      MX  11  14
1  2013-01-01     USA  23  13
2  2013-01-02      MX  14  16
3  2013-01-02     USA  20  19

edited Jun 27, 2013 at 3:49

answered Jun 27, 2013 at 2:46

Danica

29k6 gold badges94 silver badges128 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

goofd Over a year ago

wow.. thanks for the quick response.. I am guessing a mental block but is there a way I can remove the index name

Danica Over a year ago

If you mean Type there, it's not actually df.index.name but instead df.columns is hierarchical and has the name Type. I edited in how to get rid of that.

goofd Over a year ago

Thanks figured it out... forgot to edit.. but it seems efficient enough.. accepting the answer :)

Richard Sheridan · Accepted Answer · 2015-01-28 22:03:45Z

4

I cooked up my own pivot based solution to the same problem before finding Dougal's answer, thought I would post it for posterity since I find it more readable:

>>> pd.__version__
'0.15.0'
>>> df
         date Country Type  Val
0  2013-01-01     USA    x   23
1  2013-01-01     USA    y   13
2  2013-01-01      MX    x   11
3  2013-01-01      MX    y   14
4  2013-01-02     USA    x   20
5  2013-01-02     USA    y   19
6  2013-01-02      MX    x   14
7  2013-01-02      MX    y   16
>>> pt=df.pivot_table(values='Val',
...                   columns='Type',
...                   index=['date','Country'],
...                   )
>>> pt
Type                 x   y
date       Country        
2013-01-01 MX       11  14
           USA      23  13
2013-01-02 MX       14  16
           USA      20  19

And then carry on with Dougal's cleanups:

>>> pt.columns.name=None
>>> pt.reset_index()
         date Country   x   y
0  2013-01-01      MX  11  14
1  2013-01-01     USA  23  13
2  2013-01-02      MX  14  16
3  2013-01-02     USA  20  19

Note that DataFrame.to_csv() produces your requested output:

>>> print(pt.to_csv())
date,Country,x,y
2013-01-01,MX,11,14
2013-01-01,USA,23,13
2013-01-02,MX,14,16
2013-01-02,USA,20,19

answered Jan 28, 2015 at 22:03

Richard Sheridan

7625 silver badges14 bronze badges

1 Comment

goofd Over a year ago

Thanks for the alternate solution!

cd98 · Accepted Answer · 2016-10-01 15:51:46Z

4

Let's store your original dataframe in df Then at least in version 0.18.1 you can do:

df.pivot_table(values="Val", index=['date', 'Country'], columns='Type')

gives the right answer:

Type                 x   y
date       Country
2013-01-01 MX       11  14
           USA      23  13
2013-01-02 MX       14  16
           USA      20  19

edited Oct 1, 2016 at 15:51

answered Oct 1, 2016 at 15:35

cd98

3,5522 gold badges37 silver badges52 bronze badges

Collectives™ on Stack Overflow

Pandas DataFrame: transforming frame using unique values of a column

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related