3

Let df1, df2, and df3 are pandas.DataFrame's having the same structure but different numerical values. I want to perform:

res=if df1>1.0: (df2-df3)/(df1-1) else df3

res should have the same structure as df1, df2, and df3 have.

numpy.where() generates result as a flat array.

Edit 1:

res should have the same indices as df1, df2, and df3 have.

For example, I can access df2 as df2["instanceA"]["parameter1"]["paramter2"]. I want to access the new calculated DataFrame/Series res as res["instanceA"]["parameter1"]["paramter2"].

3 Answers 3

2

Actually numpy.where should work fine there. Output here is 4x2 (same as df1, df2, df3).

df1 = pd.DataFrame( np.random.randn(4,2), columns=list('xy') )
df2 = pd.DataFrame( np.random.randn(4,2), columns=list('xy') )
df3 = pd.DataFrame( np.random.randn(4,2), columns=list('xy') )

res = df3.copy()
res[:] = np.where( df1 > 1, (df2-df3)/(df1-1), df3 )

          x         y
0 -0.671787 -0.445276
1 -0.609351 -0.881987
2  0.324390  1.222632
3 -0.138606  0.955993

Note that this should work on both series and dataframes. The [:] is slicing syntax that preserves the index and columns. Without that res will come out as an array rather than series or dataframe.

Alternatively, for a series you could write as @Kadir does in his answer:

res = pd.Series(np.where( df1>1, (df2-df3)/(df1-1), df3 ), index=df1.index)

Or similarly for a dataframe you could write:

res = pd.DataFrame(np.where( df1>1, (df2-df3)/(df1-1), df3 ), index=df1.index,
                                                              columns=df1.columns)
Sign up to request clarification or add additional context in comments.

Comments

1

Integrating the idea in this question into JohnE's answer, I have come up with this solution:

res = pd.Series(np.where( df1 > 1, (df2-df3)/(df1-1), df3 ), index=df1.index)

A better answer using DataFrames will be appreciated.

Comments

0

Say df is your initial dataframe and res is the new column. Use a combination of setting values and boolean indexing.

Set res to be a copy of df3:

 df['res'] = df['df3']

Then adjust values for your condition.

df[df['df1']>1.0]['res'] = (df['df2'] - df['df3'])/(df['df1']-1)

1 Comment

I believe df1/df2/df3 should be dataframes, not columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.