Python Pandas: Using apply() to subtract a value from an array

Question

I want to use the pandas apply() instead of iterating through each row of a dataframe, which from my knowledge is the more efficient procedure.

What I want to do is simple:

temp_arr = [0,1,2,3]
# I know this is not a dataframe, just want to show quickly how it looks like.
temp_df is a 4x4 dataframe, simply: [[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
For each row in my temp_df, minus the corresponding number in the temp_arr.

So for example, the first row in my dataframe is [1,1,1,1] and I want to minus the first item in my temp_arr (which is 0) from them, so the output should be [1,1,1,1]. The second row is [2,2,2,2] and I want to minus the second item in temp_arr (which is 1) from them, so the output should also be [1,1,1,1].

If I'm subtracting a constant number, I know I can easily do that with:

temp_df.apply(lambda x: x-1)

But the tricky thing here is that I need to iterate through my temp_arr to get the subtracted number. Any way I can do this with apply()?

[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]] is not a DataFrame. It's a list. — user554546
– user554546, Commented Dec 29, 2016 at 19:27
I just didn't write out the whole pd.Dataframe() bit. I was just trying to quickly show how the dataframe looks like without adding all the code to it. — Heavy Breathing
– Heavy Breathing, Commented Dec 29, 2016 at 19:28
Well, why not just make temp_arr into a Series, and then subtract it from your rows? — user554546
– user554546, Commented Dec 29, 2016 at 19:30

piRSquared · Accepted Answer · 2016-12-29 19:40:54Z

5

consider the array a and dataframe df

a = np.arange(4)
df = pd.DataFrame(np.repeat([1, 2, 3, 4], 4).reshape(4, -1))

print(a)

[0 1 2 3]

print(df)

   0  1  2  3
0  1  1  1  1
1  2  2  2  2
2  3  3  3  3
3  4  4  4  4

You want to use pd.DataFrame.sub with axis=0
This will align your array with axis=0 or the index and perform the subtraction column by column

print(df.sub(a, axis=0))

   0  1  2  3
0  1  1  1  1
1  1  1  1  1
2  1  1  1  1
3  1  1  1  1

extra credit
using numpy broadcasting to align axes

 print(df.values - a[:, None])

[[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]

construct dataframe

d1 = pd.DataFrame(df.values - a[:, None], df.index, df.columns)
print(d1)

   0  1  2  3
0  1  1  1  1
1  1  1  1  1
2  1  1  1  1
3  1  1  1  1

edited Dec 29, 2016 at 19:40

answered Dec 29, 2016 at 19:31

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Heavy Breathing Over a year ago

Very elegant solution! I didn't know about the dataframe subtract and sub functions (both seem to be identical to each other...)! Thanks!

piRSquared Over a year ago

df1 - df2 is identical to df1.sub(df2, axis=1). By accessing the sub method directly, you can change that axis parameter.

lowtech · Accepted Answer · 2016-12-29 19:46:21Z

0

Apply by row using index to refer another dataframe:

import numpy as np
import pandas as pd
df = pd.DataFrame(data = [[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]])    
a = pd.DataFrame({'a': np.arange(4), 'b': np.arange(1, 5)})
print df.apply(lambda x: x - a.ix[x.index, 'a'], axis = 1)
print df.apply(lambda x: x - a.ix[x.index, 'b'], axis = 1)

To address original question:

import numpy as np
import pandas as pd
term_df = pd.DataFrame(data = [[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]])    
temp_arr = np.arange(4)
print temp_df.apply(lambda x: x - temp_arr[x.index], axis = 1)

edited Dec 29, 2016 at 19:46

answered Dec 29, 2016 at 19:30

lowtech

2,6223 gold badges25 silver badges34 bronze badges

Collectives™ on Stack Overflow

Python Pandas: Using apply() to subtract a value from an array

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related