mapping a multi-index to existing pandas dataframe columns using separate dataframe

Question

I have an existing data frame in the following format (let's call it df):

               A     B     C     D
0              1     2     1     4
1              3     0     2     2
2              1     5     3     1

The column names were extracted from a spreadsheet that has the following form (let's call it cat_df):

                      current category
broader category
X                     A
Y                     B
Y                     C
Z                     D

First I'd like to prepend a higher level index to make df look like so:

               X     Y           Z
               A     B     C     D
0              1     2     1     4
1              3     0     2     2
2              1     5     3     1

Lastly i'd like to 'roll-up' the data into the meta-index by summing over subindices, to generate a new dataframe like so:

               X     Y     Z
0              1     3     4
1              3     2     2
2              1     8     1

Using concat from this answer has gotten me close, but it seems like it'd be a very manual process picking out each subset. My true dataset is has a more complex mapping, so I'd like to refer to it directly as I build my meta-index. I think once I get the meta-index settled, a simple groupby should get me to the summation, but I'm still stuck on the first step.

piRSquared · Accepted Answer · 2018-04-18 22:25:50Z

4

d = dict(zip(cat_df['current category'], cat_df.index))

cols = pd.MultiIndex.from_arrays([df.columns.map(d.get), df.columns])
df.set_axis(cols, axis=1, inplace=False)

   X  Y     Z
   A  B  C  D
0  1  2  1  4
1  3  0  2  2
2  1  5  3  1

df_new = df.set_axis(cols, axis=1, inplace=False)
df_new.groupby(axis=1, level=0).sum()

   X  Y  Z
0  1  3  4
1  3  2  2
2  1  8  1

answered Apr 18, 2018 at 22:25

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Al S Over a year ago

This one worked great for me and was the most understandable. However, I had to make one modification as I'm using an older version of Pandas, 0.20.3. I replaced the set_axis() line with df.set_axis(1,cols) (with the caveat that it changes the dataframe in place) as the syntax changed in version 0.22.

Scott Boston · Accepted Answer · 2018-04-18 22:23:41Z

2

IIUC, you can do it like this.

df.columns = pd.MultiIndex.from_tuples(cat_df.reset_index()[['broader category','current category']].apply(tuple, axis=1).tolist())

print(df)

Output:

   X  Y     Z
   A  B  C  D
0  1  2  1  4
1  3  0  2  2
2  1  5  3  1

Sum level:

df.sum(level=0, axis=1)

Output:

answered Apr 18, 2018 at 22:23

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Comments

BENY · Accepted Answer · 2018-04-18 23:02:23Z

2

You can using set_index for creating the idx, then assign to your df

idx=df1.set_index('category',append=True).index

df.columns=idx

df
Out[1170]:
current   X  Y     Z
category  A  B  C  D
0         1  2  1  4
1         3  0  2  2
2         1  5  3  1

df.sum(axis=1,level=0)
Out[1171]: 
current  X  Y  Z
0        1  3  4
1        3  2  2
2        1  8  1

answered Apr 18, 2018 at 23:02

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

mapping a multi-index to existing pandas dataframe columns using separate dataframe

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related