0

For example, I have two dataframes, df1 and df2.

I have to do the mean value between then. But, if df1 is np.nan, the mean will be the value from df2.

df1 = pd.DataFrame([1, 5, np.NaN, 4, np.NaN, 9])
df2 = pd.DataFrame([8, 2, np.NaN, np.NaN, 5, np.NaN])

I expect something like this.

df3 = pd.DataFrame([4.5, 3.5, np.NaN, 4, 5, 9])

My problem is, how can I do this condition?

0

1 Answer 1

0

You have several options. If you consider your single column DataFrames as Series, you could simply concat and mean on axis=1 (which ignores the NaNs by default):

out = pd.concat([df1[0], df2[0]], axis=1).mean(axis=1)

Output:

0    4.5
1    3.5
2    NaN
3    4.0
4    5.0
5    9.0
dtype: float64

If you potentially have multiple columns, you could manually compute the mean by computing the sum and dividing by the number of non-NaN:

out = df1.add(df2, fill_value=0).div(df1.notna().astype(int)+df2.notna())

Or, for a more generic option that could also handle an arbitrary number of DataFrames as input, using concat+groupy.mean:

out = pd.concat([df1, df2]).groupby(level=0).mean()

Output:

     0
0  4.5
1  3.5
2  NaN
3  4.0
4  5.0
5  9.0

timings

Using one column for each of the two DataFrames and a random choice of numbers or NaN. enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.