0

Is there a more succinct / pythonic / pandas-native way of writing the following?

all_pos = ['NN', 'VB', 'ADJ']
for col in all_pos:
    df_out['delta_'+col] = df_out[col] - mean_df[col] 

df_out and mean_df contain the same column names and indices, and I want to create new columns in df_out containing the difference between them.

So df_out could contain

Index  NN VB ADJ

239    9  4  3
250    2  2  1

And df_mean could contain

Index  NN VB ADJ

239    3  1  8
250    7  4  3

I would want df_out to look like

    Index  NN VB ADJ delta_NN delta_VB delta_ADJ

    239    9  4  3       6        3       -5
    250    2  2  1      -5       -2       -2

1 Answer 1

2

Use a simple subtraction (no need to do it per column) and concat the input and output:

pd.concat([df_out,
           (df_out - df_mean).add_prefix('delta_')
          ], axis=1)

or

df1.join((df1-df2).add_prefix('delta_'))

(df_out - df_mean) can also be written df_out.sub(df_mean)

output:

       NN  VB  ADJ  delta_NN  delta_VB  delta_ADJ
Index                                            
239     9   4    3         6         3         -5
250     2   2    1        -5        -2         -2

NB. I assumed "Index" is the index, if not first run:

df_out.set_index('Index', inplace=True)
df_mean.set_index('Index', inplace=True)
Sign up to request clarification or add additional context in comments.

1 Comment

You can also use df1.join((df1-df2).add_prefix('delta_')).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.