1

Imagine two dataframes:

X = pd.DataFrame([[1,2],[3,4],[5,6]], columns=["a", "b"])
Y = pd.DataFrame([10,20,30], columns=["a"])

>>> X
   a  b
0  1  2
1  3  4
2  5  6
>>> Y
   a
0  10
1  20
2  30

Overall, I want my final output to be like this:

   a_X  b_X  a_Y b_Y sum_a sum_b
0    1  2    10  NaN  11      2
1    3  4    20  NaN  23      4
2    5  6    30  NaN  35      6

I tried to do it by:

merged = X.join(Y, lsuffix="_X", rsuffix="_Y")
merged['sum_a'] = merged['a_X'] + merged['a_Y'] # works
merged['sum_b'] = merged['b_X'] + merged['b_Y'] # doesn't work

Obviously the sum_b column will fail because there was no b column in the Y set. It could be there, but it doesn't have to, my dataset doesn't have any guarantees. It doesn't look like I can use built-in join to add that "NaN" column there.

4 Answers 4

1

Concatenate with pd.concat -

k = ['X', 'Y']

df = pd.concat([X, Y], keys=k, axis=1)
df

   X      Y
   a  b   a
0  1  2  10
1  3  4  20
2  5  6  30

Generate a MultiIndex and use it to reindex -

idx = pd.MultiIndex.from_product([k, df.columns.levels[1].unique()])
df = df.reindex(columns=idx)
df

   X      Y    
   a  b   a   b
0  1  2  10 NaN
1  3  4  20 NaN
2  5  6  30 NaN

Re-set the column names -

df.columns = df.columns.map('_'.join)
df

   X_a  X_b  Y_a  Y_b
0    1    2   10  NaN
1    3    4   20  NaN
2    5    6   30  NaN

Now, you can groupby suffix and find sums -

v = df.groupby(by=lambda x: x.split('_')[1], axis=1).sum().add_prefix('sum_')
v

   sum_a  sum_b
0   11.0    2.0
1   23.0    4.0
2   35.0    6.0

Concatenate this with the original:

pd.concat([df, v], 1)

   X_a  X_b  Y_a  Y_b  sum_a  sum_b
0    1    2   10  NaN   11.0    2.0
1    3    4   20  NaN   23.0    4.0
2    5    6   30  NaN   35.0    6.0
Sign up to request clarification or add additional context in comments.

Comments

1
df=pd.concat([X,Y.reindex(columns=X.columns)],keys=['x','y'],axis=1)

x=df.groupby(level=1,axis=1).sum().add_prefix('sum_')

df.columns=df.columns.map('{0[1]}{0[0]}'.format)


pd.concat([df,x],1)
Out[58]: 
   ax  bx  ay  by  sum_a  sum_b
0   1   2  10 NaN   11.0    2.0
1   3   4  20 NaN   23.0    4.0
2   5   6  30 NaN   35.0    6.0

Comments

0

You can do:

import numpy as np

Y['b'] = np.nan
merged = X.join(Y, lsuffix="_X", rsuffix="_Y")
merged['sum_a'] = merged['a_X'] + merged['a_Y']
merged['sum_b'] = merged['b_X'] + merged.fillna(0)['b_Y']

#>>> merged
#   a_X  b_X  a_Y  b_Y  sum_a  sum_b
#0    1    2   10  NaN     11    2.0
#1    3    4   20  NaN     23    4.0
#2    5    6   30  NaN     35    6.0

Comments

0

An alternative closer to what you are doing. As Y does not have to have the same columns as X you can use reindex for Y and then perform the operations with the fill_value option:

Y = Y.reindex(columns=X.columns)
>>> Y
#    a    b
#0  10  NaN
#1  20  NaN  
#2  30  NaN

merged = X.join(Y, lsuffix="_X", rsuffix="_Y")
merged['sum_a'] = merged['a_X'].add(merged['a_Y'], fill_value=0)
merged['sum_b'] = merged['b_X'].add(merged['b_Y'], fill_value=0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.