2

I have an existing data frame in the following format (let's call it df):

               A     B     C     D
0              1     2     1     4
1              3     0     2     2
2              1     5     3     1

The column names were extracted from a spreadsheet that has the following form (let's call it cat_df):

                      current category
broader category
X                     A
Y                     B
Y                     C
Z                     D

First I'd like to prepend a higher level index to make df look like so:

               X     Y           Z
               A     B     C     D
0              1     2     1     4
1              3     0     2     2
2              1     5     3     1

Lastly i'd like to 'roll-up' the data into the meta-index by summing over subindices, to generate a new dataframe like so:

               X     Y     Z
0              1     3     4
1              3     2     2
2              1     8     1

Using concat from this answer has gotten me close, but it seems like it'd be a very manual process picking out each subset. My true dataset is has a more complex mapping, so I'd like to refer to it directly as I build my meta-index. I think once I get the meta-index settled, a simple groupby should get me to the summation, but I'm still stuck on the first step.

3 Answers 3

4
d = dict(zip(cat_df['current category'], cat_df.index))

cols = pd.MultiIndex.from_arrays([df.columns.map(d.get), df.columns])
df.set_axis(cols, axis=1, inplace=False)

   X  Y     Z
   A  B  C  D
0  1  2  1  4
1  3  0  2  2
2  1  5  3  1

df_new = df.set_axis(cols, axis=1, inplace=False)
df_new.groupby(axis=1, level=0).sum()

   X  Y  Z
0  1  3  4
1  3  2  2
2  1  8  1
Sign up to request clarification or add additional context in comments.

1 Comment

This one worked great for me and was the most understandable. However, I had to make one modification as I'm using an older version of Pandas, 0.20.3. I replaced the set_axis() line with df.set_axis(1,cols) (with the caveat that it changes the dataframe in place) as the syntax changed in version 0.22.
2

IIUC, you can do it like this.

df.columns = pd.MultiIndex.from_tuples(cat_df.reset_index()[['broader category','current category']].apply(tuple, axis=1).tolist())

print(df)

Output:

   X  Y     Z
   A  B  C  D
0  1  2  1  4
1  3  0  2  2
2  1  5  3  1

Sum level:

df.sum(level=0, axis=1)

Output:

   X  Y  Z
0  1  3  4
1  3  2  2
2  1  8  1

Comments

2

You can using set_index for creating the idx, then assign to your df

idx=df1.set_index('category',append=True).index

df.columns=idx

df
Out[1170]:
current   X  Y     Z
category  A  B  C  D
0         1  2  1  4
1         3  0  2  2
2         1  5  3  1

df.sum(axis=1,level=0)
Out[1171]: 
current  X  Y  Z
0        1  3  4
1        3  2  2
2        1  8  1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.