Vectorized operations in Pandas with fixed columns/rows/values

Question

I would like to perform operations on Pandas dataframes using fixed columns, rows, or values.

For example:

import numpy as np
import pandas as pd

df = pd.DataFrame({'a':(1,2,3), 'b':(4,5,6), 'c':(7,8,9), 'd':(10,11,12),
                  'e':(13,14,15)})

df
Out[57]: 
   a  b  c   d   e
0  1  4  7  10  13
1  2  5  8  11  14
2  3  6  9  12  15

I want to use the values in columns 'a' and 'b' as fixed values.


# It's easy enough to perform the operation I want on one column at a time:
df.loc[:,'f'] = df.loc[:,'c'] + df.loc[:,'a'] + df.loc[:,'b']

# It gets cumbersome if there are many columns to perform the operation on though:
df.loc[:,'g'] = df.loc[:,'d'] / df.loc[:,'a'] * df.loc[:,'b']
df.loc[:,'h'] = df.loc[:,'e'] / df.loc[:,'a'] * df.loc[:,'b']
# etc.

# This returns columns with all NaN values.
df.loc[:,('f','g','h')] = df.loc[:,'c':'e'] / df.loc[:'a']

Is there an optimal way to do what I want in Pandas? I could not find working solutions in the Pandas documentation or this SO thread. I don't think I can use .map() or .applymap(), because I'm under the impression they can only be using for simple equations (one input value). Thanks for reading.

Corralien · Accepted Answer · 2022-02-16 15:47:51Z

2

Use div and mul instead of / and * with axis=0:

df[['g', 'h']] = df[['d', 'e']].div(df['a'], axis=0).mul(df['b'], axis=0)
print(df)

# Output
   a  b  c   d   e     g     h
0  1  4  7  10  13  40.0  52.0
1  2  5  8  11  14  27.5  35.0
2  3  6  9  12  15  24.0  30.0

With numpy:

arr = df.to_numpy()
arr[:, [3, 4]] / arr[:, [0]] * arr[:, [1]]

# Output
array([[40. , 52. ],
       [27.5, 35. ],
       [24. , 30. ]])

edited Feb 16, 2022 at 15:47

answered Feb 16, 2022 at 15:42

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

LabRat01010 · Accepted Answer · 2022-02-16 16:24:40Z

0

As @Corralien pointed out, its better to use Pandas dataframe operations such as .div(), but I also figured out that the usage of .loc[] is important.

# Doesn't work:
df.loc[:,['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)

# Doesn't work:
df[['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)

# Now works.
df[['f','g','h']] = df.loc[:,'c':'e'].div(df['a'], axis=0)

At the moment, I'm not exactly sure why this is. Any insight would be helpful, thanks.

answered Feb 16, 2022 at 16:24

LabRat01010

891 silver badge9 bronze badges

1 Comment

Community Over a year ago

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

Vectorized operations in Pandas with fixed columns/rows/values

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related