Subtract DataFrame rows from other's dataframe specific rows

Question

I have to a dataframe df in each row I have some columns I want to make subtraction, columns_to_sub, some tag column called 'absorb' and some columns that I don't want to change. I want to subtract the values of columns_to_sub by a row that is on another dataframe and is indexed by the tag 'absorb'. Here is a non functional example of what I want:

import pandas as pd
import numpy as np
data = np.hstack((np.random.randint(0,10,20).reshape(-1,1),np.random.rand(20,3)))
df = pd.DataFrame(data,columns=['absorb','a','b','c'])
columns_to_sub = ['a','b']

means = df.groupby('absorb')[columns_to_sub].mean()
#This result is not what I want, because the subtraction is strange
df[columns_to_sub] = df[columns_to_sub] - means.loc[df.absorb,columns_to_sub]

How do I fix this code?

Alexander · Accepted Answer · 2016-01-28 18:03:35Z

2

You were so close. Just use values on means.

df[columns_to_sub] = df[columns_to_sub] - means.loc[df.absorb,columns_to_sub].values
>>> df
    absorb         a         b         c
0        2 -0.060540 -0.270233  0.416213
1        9  0.597084  0.136158  0.415023
2        1 -0.131393 -0.535288  0.158465
3        3  0.282902 -0.008801  0.872598
4        9 -0.236306 -0.337588  0.297589
5        6  0.000000  0.000000  0.283559
6        3  0.022021 -0.110693  0.671295
7        7  0.042000 -0.327157  0.736395
8        1  0.097912  0.119899  0.409241
9        1 -0.460052  0.280302  0.341200
10       1  0.002855 -0.013902  0.648113
11       1  0.490679  0.148989  0.626300
12       8  0.000000  0.000000  0.986039
13       3 -0.304923  0.119494  0.553210
14       0  0.000000  0.000000  0.626576
15       5  0.000000  0.000000  0.105102
16       2 -0.166760 -0.122624  0.750912
17       2  0.227300  0.392857  0.498822
18       7 -0.042000  0.327157  0.323361
19       9 -0.360778  0.201430  0.521043

answered Jan 28, 2016 at 18:03

Alexander

111k32 gold badges212 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ami Tavory Over a year ago

Nice answer (+1). Pretty sure df.groupby('absorb') is wrong, though.

Alexander Over a year ago

@AmiTavory Have you tried it? Works fine for me. Just use the column names when grouping on the dataframe. There is no need for df.groupby(df.column). pandas.pydata.org/pandas-docs/stable/groupby.html

Ami Tavory Over a year ago

Curious about what you think: by me, it gives different results when I change the two versions.

ako · Accepted Answer · 2016-01-28 18:05:31Z

If you set 'absorb' as the index on df the subtraction will be straightforward. Although absorb is a non-unique index, so make sure that is what you want.

data = np.hstack((np.random.randint(0,10,20).reshape(-1,1),np.random.rand(20,3)))
df = pd.DataFrame(data,columns=['absorb','a','b','c']).set_index('absorb')
df.head()

               a         b         c
absorb                              
8       0.942156  0.675819  0.606406
0       0.801685  0.360899  0.055210
7       0.540333  0.691493  0.580708
7       0.234766  0.446549  0.295496
4       0.942021  0.338729  0.827124

Thus far df, with the absorb index.

Then, the means:

columns_to_sub = ['a','b']

means = df.groupby(level=0)[columns_to_sub].mean()
means.head()
               a         b
absorb                    
0       0.871498  0.659507
1       0.113925  0.711533
2       0.485379  0.191867
4       0.557054  0.581740

Then the subtraction can be done like so:

result = df[columns_to_sub] -  means[columns_to_sub]
result.head()
               a         b
absorb                    
0      -0.069813 -0.298608
0       0.069813  0.298608
1       0.000000  0.000000
2       0.451854  0.164074
2      -0.451854 -0.164074

Collectives™ on Stack Overflow

Subtract DataFrame rows from other's dataframe specific rows

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related