python pandas: pivoting columns to rows

Question

I have a table like:

country | name  | medals_won | year
-----------------------------------
US      | sarah |      1     | 2010
US      | sarah |      2     | 2011
US      | sarah |      5     | 2015
US      | alice |      3     | 2010
US      | alice |      4     | 2012
US      | alice |      1     | 2015
AU      | jones |      2     | 2013
AU      | jones |      8     | 2015

I want it like:

country | name  | 2010 | 2011 | 2012 | 2013 | 2014 | 2015
---------------------------------------------------------
US      | sarah | 1    | 2    | 0    | 0    | 0    | 5
US      | alice | 3    | 0    | 4    | 0    | 0    | 1
AU      | jones | 0    | 0    | 0    | 2    | 0    | 8

I've tinkered with df.apply, or even brute-force iteration, but you can probably guess that the tricky part is that these row values aren't strictly sequential, so this isn't a simple transpose operation (nobody won any medals in 2014, for e.g., but I want the resulting table to show that in a column full of zeros).

jezrael · Accepted Answer · 2017-04-19 13:24:53Z

6

You can use set_index + unstack:

df = df.set_index(['country','name','year'])['medals_won'].unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones     0     0     0     2     8
US      alice     3     0     4     0     1
        sarah     1     2     0     0     5

If duplicates need aggregation like mean, sum... with pivot_table or groupby + aggregate function + unstack:

print (df)
  country   name  medals_won  year
0      US  sarah           1  2010 <-same US  sarah 2010, different 1
1      US  sarah           4  2010 <-same US  sarah 2010, different 4
2      US  sarah           2  2011
3      US  sarah           5  2015
4      US  alice           3  2010
5      US  alice           4  2012
6      US  alice           1  2015
7      AU  jones           2  2013
8      AU  jones           8  2015

df = df.pivot_table(index=['country','name'], 
                    columns='year', 
                    values='medals_won', 
                    fill_value=0, 
                    aggfunc='mean')
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0     0     0     2     8
US      alice   3.0     0     4     0     1
        sarah   2.5     2     0     0     5 <- (1+4)/2 = 2.5

Alternatively:

df = df.groupby(['country','name','year'])['medals_won'].mean().unstack(fill_value=0)
print (df)
year           2010  2011  2012  2013  2015
country name                               
AU      jones   0.0   0.0   0.0   2.0   8.0
US      alice   3.0   0.0   4.0   0.0   1.0
        sarah   2.5   2.0   0.0   0.0   5.0

Last:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
  country   name  2010  2011  2012  2013  2015
0      AU  jones     0     0     0     2     8
1      US  alice     3     0     4     0     1
2      US  sarah     1     2     0     0     5

edited Apr 19, 2017 at 13:24

answered Apr 19, 2017 at 13:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Scott Boston Over a year ago

Now that is just greedy taking all the answers. :-)

jezrael Over a year ago

Yes, but always I think is necessary explain difference between solutions - if automatically use always pivot_table and no idea about aggreagtion, then all works nice but if data changed people are surprised whats going on. But If know there is aggreagtion, no problem. ;)

Ali Arsalan · Accepted Answer · 2017-04-19 14:02:43Z

-1

You can use the pivot_table() function of pandas and fill nan values with zero using pd.fillna(0)

    df = pd.DataFrame({
        'country' : pd.Series(['US', 'US', 'US', 'US', 'US', 'US', 'AU', 'AU']),
        'name' : pd.Series(['sarah', 'sarah','sarah','alice','alice','alice','jones','jones']),
        'medals_won' : pd.Series([1,2,5,3,4,1,2,8]),
        'year': pd.Series([2010,2011,2015,2010,2012,2015,2013,2015])    
        })
    pd.pivot_table(df, index=['country','name'], columns='year', aggfunc='sum').fillna(0)

my output

answered Apr 19, 2017 at 14:02

Ali Arsalan

397 bronze badges

Collectives™ on Stack Overflow

python pandas: pivoting columns to rows

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related