Imagine two dataframes:
X = pd.DataFrame([[1,2],[3,4],[5,6]], columns=["a", "b"])
Y = pd.DataFrame([10,20,30], columns=["a"])
>>> X
a b
0 1 2
1 3 4
2 5 6
>>> Y
a
0 10
1 20
2 30
Overall, I want my final output to be like this:
a_X b_X a_Y b_Y sum_a sum_b
0 1 2 10 NaN 11 2
1 3 4 20 NaN 23 4
2 5 6 30 NaN 35 6
I tried to do it by:
merged = X.join(Y, lsuffix="_X", rsuffix="_Y")
merged['sum_a'] = merged['a_X'] + merged['a_Y'] # works
merged['sum_b'] = merged['b_X'] + merged['b_Y'] # doesn't work
Obviously the sum_b column will fail because there was no b column in the Y set. It could be there, but it doesn't have to, my dataset doesn't have any guarantees. It doesn't look like I can use built-in join to add that "NaN" column there.