Pandas: Pivoting with multi-index data

Question

I have two dataframes which looks like this:

rating
   BMW  Fiat  Toyota
0    7     2       3
1    8     1       8
2    9    10       7
3    8     3       9

own
   BMW  Fiat  Toyota
0    1     1       0
1    0     1       1
2    0     0       1
3    0     1       1

I'm ultimately trying to get a pivot table of mean rating for usage by brand. Or something like this:

            BMW  Fiat  Toyota
Usage                        
0      8.333333    10       3
1      7.000000     2       8

My approach was to merge the datasets like this:

Measure  Rating                Own              
Brand       BMW  Fiat  Toyota  BMW  Fiat  Toyota
0             7     2       3    1     1       0
1             8     1       8    0     1       1
2             9    10       7    0     0       1
3             8     3       9    0     1       1

And then attempt to create a pivot table using rating as the value, own as the rows and brand as the columns. But I kept running to key issues. I have also attempted unstacking either the measure or brand levels, but I can't seem to use row index names as pivot keys.

What am I doing wrong? Is there a better approach to this?

roman · Accepted Answer · 2013-10-19 18:15:47Z

4

I'm not an expert in Pandas, so the solution may be more clumsy than you want, but:

rating = pd.DataFrame({"BMW":[7, 8, 9, 8], "Fiat":[2, 1, 10, 3], "Toyota":[3, 8, 7,9]})
own = pd.DataFrame({"BMW":[1, 0, 0, 0], "Fiat":[1, 1, 0, 1], "Toyota":[0, 1, 1, 1]})

r = rating.unstack().reset_index(name='value')
o = own.unstack().reset_index(name='value')
res = DataFrame({"Brand":r["level_0"], "Rating": r["value"], "Own": o["value"]})
res = res.groupby(["Own", "Brand"]).mean().reset_index()
res.pivot(index="Own", columns="Brand", values="Rating")

# result
# Brand       BMW  Fiat  Toyota
# Own                          
# 0      8.333333    10       3
# 1      7.000000     2       8

another solution, although not very much generalizable (you can use for loop, but you have to know which values do you have in own dataframe):

d = []
for o in (0, 1):
    t = rating[own == o]
    t["own"] = o
    d.append(t)

res = pd.concat(d).groupby("own").mean()

edited Oct 19, 2013 at 18:15

answered Oct 17, 2013 at 20:09

roman

118k30 gold badges205 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Brendon McLean Over a year ago

Thanks. Great to have a solution. You're right that I was hoping for something more elegant, but a solution unblocks me. I can always write a function.

roman Over a year ago

@Brendon I'm trying to spend as much time as I can to learn Pandas now, will see what can I do a after week or two :) Please don't accept the answer, may be some gurus will arrive with superelegant solution

Brendon McLean Over a year ago

Well, your tagline on your profile says as much :). I will hold off accepting your answer for another week. Thanks again.

roman Over a year ago

@Brendon take a look, I've added another solution, more pythonic one I think. If I knew how to add column to DataFrame inplace, it could be even shorter

Brendon McLean · Accepted Answer · 2013-10-23 17:11:59Z

I have a new answer to my own question (based on Roman's initial answer). The key is to get the index at the required dimensionality. For example

rating.columns.names = ["Brand"]
rating.index.names = ["n"]
print rating

Brand  BMW  Fiat  Toyota
n                       
0        7     2       3
1        8     1       8
2        9    10       7
3        8     3       9

own.columns.names = ["Brand"]
own.index.names = ["n"]
print own

Brand  BMW  Fiat  Toyota
n                       
0        1     1       0
1        0     1       1
2        0     0       1
3        0     1       1

merged = pd.merge(own.unstack().reset_index(name="Own"), 
                  rating.unstack().reset_index(name="Rating"))
print merged

     Brand  n  Own  Rating
0      BMW  0    1       7
1      BMW  1    0       8
2      BMW  2    0       9
3      BMW  3    0       8
4     Fiat  0    1       2
5     Fiat  1    1       1
6     Fiat  2    0      10
7     Fiat  3    1       3
8   Toyota  0    0       3
9   Toyota  1    1       8
10  Toyota  2    1       7
11  Toyota  3    1       9

Then it's easy to use the pivot_table command to turn this into the desired result:

print merged.pivot_table(rows="Brand", cols="Own", values="Rating")

Own             0  1
Brand               
BMW      8.333333  7
Fiat    10.000000  2
Toyota   3.000000  8

And that is what I was looking for. Thanks again to Roman for pointing the way.

Collectives™ on Stack Overflow

Pandas: Pivoting with multi-index data

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related