5

I want to use the pandas apply() instead of iterating through each row of a dataframe, which from my knowledge is the more efficient procedure.

What I want to do is simple:

temp_arr = [0,1,2,3]
# I know this is not a dataframe, just want to show quickly how it looks like.
temp_df is a 4x4 dataframe, simply: [[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
For each row in my temp_df, minus the corresponding number in the temp_arr. 

So for example, the first row in my dataframe is [1,1,1,1] and I want to minus the first item in my temp_arr (which is 0) from them, so the output should be [1,1,1,1]. The second row is [2,2,2,2] and I want to minus the second item in temp_arr (which is 1) from them, so the output should also be [1,1,1,1].

If I'm subtracting a constant number, I know I can easily do that with:

temp_df.apply(lambda x: x-1)

But the tricky thing here is that I need to iterate through my temp_arr to get the subtracted number. Any way I can do this with apply()?

3
  • [[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]] is not a DataFrame. It's a list. Commented Dec 29, 2016 at 19:27
  • I just didn't write out the whole pd.Dataframe() bit. I was just trying to quickly show how the dataframe looks like without adding all the code to it. Commented Dec 29, 2016 at 19:28
  • Well, why not just make temp_arr into a Series, and then subtract it from your rows? Commented Dec 29, 2016 at 19:30

2 Answers 2

5

consider the array a and dataframe df

a = np.arange(4)
df = pd.DataFrame(np.repeat([1, 2, 3, 4], 4).reshape(4, -1))

print(a)

[0 1 2 3]

print(df)

   0  1  2  3
0  1  1  1  1
1  2  2  2  2
2  3  3  3  3
3  4  4  4  4

You want to use pd.DataFrame.sub with axis=0
This will align your array with axis=0 or the index and perform the subtraction column by column

print(df.sub(a, axis=0))

   0  1  2  3
0  1  1  1  1
1  1  1  1  1
2  1  1  1  1
3  1  1  1  1

extra credit
using numpy broadcasting to align axes

 print(df.values - a[:, None])

[[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]

construct dataframe

d1 = pd.DataFrame(df.values - a[:, None], df.index, df.columns)
print(d1)

   0  1  2  3
0  1  1  1  1
1  1  1  1  1
2  1  1  1  1
3  1  1  1  1
Sign up to request clarification or add additional context in comments.

2 Comments

Very elegant solution! I didn't know about the dataframe subtract and sub functions (both seem to be identical to each other...)! Thanks!
df1 - df2 is identical to df1.sub(df2, axis=1). By accessing the sub method directly, you can change that axis parameter.
0

Apply by row using index to refer another dataframe:

import numpy as np
import pandas as pd
df = pd.DataFrame(data = [[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]])    
a = pd.DataFrame({'a': np.arange(4), 'b': np.arange(1, 5)})
print df.apply(lambda x: x - a.ix[x.index, 'a'], axis = 1)
print df.apply(lambda x: x - a.ix[x.index, 'b'], axis = 1)

To address original question:

import numpy as np
import pandas as pd
term_df = pd.DataFrame(data = [[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]])    
temp_arr = np.arange(4)
print temp_df.apply(lambda x: x - temp_arr[x.index], axis = 1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.