Pandas add new second level column to column multiindex based on other columns

Question

I have a DataFrame with column multi-index:

System   A                B
Trial    Exp1    Exp2     Exp1    Exp2
1        NaN     1        2       3
2        4       5        NaN     NaN
3        6       NaN      7       8

Turns out for each system (A, B) and each measurement (1, 2, 3 in index), results from Exp1 is always superior to Exp2. So I want to generate a 3rd column for each system, call it Final, that should take Exp1 whenever available, and default to Exp2 otherwise. The desired result is

System   A                       B
Trial    Exp1    Exp2    Final   Exp1    Exp2    Final
1        NaN     1       1       2       3       2
2        4       5       4       NaN     NaN     NaN
3        6       NaN     6       7       8       7

What is the best way to do this?

I've tried to use groupby on the columns:

grp = df.groupby(level=0, axis=1)

And was thinking of using either transform or apply combined by assign to achieve it. But am not able to find either a working or an efficient way of doing it. Specifically I am avoiding native python for loops for efficiency reasons (else the problem is trivial).

jezrael · Accepted Answer · 2017-05-08 16:59:37Z

7

Use stack for reshape, add column with fillna and then reshape back by unstack with swaplevel + sort_index:

df = df.stack(level=0)
df['Final'] = df['Exp1'].fillna(df['Exp1'])
df = df.unstack().swaplevel(0,1,axis=1).sort_index(axis=1)
print (df)
System    A               B           
Trial  Exp1 Exp2 Final Exp1 Exp2 Final
1       NaN  1.0   NaN  2.0  3.0   2.0
2       4.0  5.0   4.0  NaN  NaN   NaN
3       6.0  NaN   6.0  7.0  8.0   7.0

Another solution with xs for select DataFrames, create new DataFrame by combine_first, but there is missing second level - was added by MultiIndex.from_product and last concat both DataFrames together:

a = df.xs('Exp1', axis=1, level=1)
b = df.xs('Exp2', axis=1, level=1)
df1 =  a.combine_first(b)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['Final']])
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
System    A               B           
Trial  Exp1 Exp2 Final Exp1 Exp2 Final
1       NaN  1.0   1.0  2.0  3.0   2.0
2       4.0  5.0   4.0  NaN  NaN   NaN
3       6.0  NaN   6.0  7.0  8.0   7.0

2 Comments

Zhang18 Over a year ago

This is pure genius! Thank you!

jezrael Over a year ago

Glad can help, I add explanation in second.

piRSquared · Accepted Answer · 2017-05-08 16:45:39Z

2

stack with your first level of the column index stack(0) leaving ['Exp1', 'Exp2'] in the column index
Use a lambda function that gets applied to the whole dataframe within an assign call.
Finally, unstack, swaplevel, sort_index to clean it up and put everything where it belongs.

f = lambda x: x.Exp1.fillna(x.Exp2)
df.stack(0).assign(Final=f).unstack() \
    .swaplevel(0, 1, 1).sort_index(1)

     A               B           
  Exp1 Exp2 Final Exp1 Exp2 Final
1  NaN  1.0   1.0  2.0  3.0   2.0
2  4.0  5.0   4.0  NaN  NaN   NaN
3  6.0  NaN   6.0  7.0  8.0   7.0

Another concept using xs

d1 = df.xs('Exp1', 1, 1).fillna(df.xs('Exp2', 1, 1))
d1.columns = [d1.columns, ['Final'] * len(d1.columns)]
pd.concat([df, d1], axis=1).sort_index(1)


     A               B           
  Exp1 Exp2 Final Exp1 Exp2 Final
1  NaN  1.0   1.0  2.0  3.0   2.0
2  4.0  5.0   4.0  NaN  NaN   NaN
3  6.0  NaN   6.0  7.0  8.0   7.0

edited May 8, 2017 at 16:45

answered May 8, 2017 at 16:22

piRSquared

296k68 gold badges509 silver badges654 bronze badges

2 Comments

Zhang18 Over a year ago

Thank you! I had to choose one answer...sorry. And I like your assign() very much! Really appreciate it.

Florin Andrei Over a year ago

stack / unstack is inefficient for large dataframes

Steven G · Accepted Answer · 2017-05-08 16:14:17Z

0

doesnt feel super optimal but try this :

for system in df.columns.levels[0]:
    df[(system, 'final')] = df[(system, 'Exp1')].fillna(df[(system, 'Exp2')])

answered May 8, 2017 at 16:14

Steven G

17.3k11 gold badges57 silver badges79 bronze badges

2 Comments

Zhang18 Over a year ago

Thx @Steven, though I forgot to mention I was able to accomplish this using loops. But I'm trying to explicitly shun away from for statements as it beats the purpose of Pandas...

Jongwook Choi Over a year ago

Don't forget sort_index, as the new columns will be appended to the very last.

Collectives™ on Stack Overflow

Pandas add new second level column to column multiindex based on other columns

3 Answers 3

2 Comments

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related