How to use np.Nan in a dataframe in Pandas [duplicate]

Question

For example, I have two dataframes, df1 and df2.

I have to do the mean value between then. But, if df1 is np.nan, the mean will be the value from df2.

df1 = pd.DataFrame([1, 5, np.NaN, 4, np.NaN, 9])
df2 = pd.DataFrame([8, 2, np.NaN, np.NaN, 5, np.NaN])

I expect something like this.

df3 = pd.DataFrame([4.5, 3.5, np.NaN, 4, 5, 9])

My problem is, how can I do this condition?

mozway · Accepted Answer · 2024-07-31 07:54:54Z

0

You have several options. If you consider your single column DataFrames as Series, you could simply concat and mean on axis=1 (which ignores the NaNs by default):

out = pd.concat([df1[0], df2[0]], axis=1).mean(axis=1)

Output:

0    4.5
1    3.5
2    NaN
3    4.0
4    5.0
5    9.0
dtype: float64

If you potentially have multiple columns, you could manually compute the mean by computing the sum and dividing by the number of non-NaN:

out = df1.add(df2, fill_value=0).div(df1.notna().astype(int)+df2.notna())

Or, for a more generic option that could also handle an arbitrary number of DataFrames as input, using concat+groupy.mean:

out = pd.concat([df1, df2]).groupby(level=0).mean()

Output:

timings

Using one column for each of the two DataFrames and a random choice of numbers or NaN.

edited Jul 31, 2024 at 7:54

answered Jul 31, 2024 at 7:37

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to use np.Nan in a dataframe in Pandas [duplicate]

1 Answer 1

timings

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

timings

Comments

Linked

Related