I have two DataFrames, which I am trying to make a single containing the mean of the two. Each has missing values.
If there were not missing values I could use (df1 + df2)/2.
I would like to take the mean of those that have both data points, while returning 'NaN' as the 'mean' for the points that are missing data.
For example:
df1 = pd.DataFrame({'ID': ['Alpha', 'Bravo', 'Charlie', 'Delta'], 'Var1': [15,'NaN', 17, 18], 'Var2': [1.0, 1.5, 2.0, 1.5], 'Var3': [1, 0, 0, 1]})
df1 = df1.set_index('ID')
Var1 Var2 Var3
ID
Alpha 15 1.0 1
Bravo NaN 1.5 0
Charlie 17 2.0 0
Delta 18 1.5 1
_
df2 = pd.DataFrame({'ID': ['Alpha', 'Bravo', 'Charlie', 'Delta'], 'Var1': [20, 15, 17, 20], 'Var2': [1.2,'NaN', 3.0, 1.0], 'Var3': [0, 0, 1, 1]})
df2=df2.set_index('ID')
Var1 Var2 Var3
ID
Alpha 20 1.2 0
Bravo 15 NaN 0
Charlie 17 3 1
Delta 20 1 1
_
The resulting DataFrame should be:
Var1 Var2 Var3
ID
Alpha 17.5 1.10 0.5
Bravo NaN NaN 0.0
Charlie 17.0 2.50 0.5
Delta 19.0 1.25 1.0
So, in simple terms, my question is how can I do (df1 + df2)/2 while ignoring the points that have NaN's?