Applying function to each row of pandas data frame - with speed

Question

I have a dataframe that has the following basic structure:

import numpy as np
import pandas as pd
tempDF = pd.DataFrame({'condition':[0,0,0,0,0,1,1,1,1,1],'x1':[1.2,-2.3,-2.1,2.4,-4.3,2.1,-3.4,-4.1,3.2,-3.3],'y1':[6.5,-7.6,-3.4,-5.3,7.6,5.2,-4.1,-3.3,-5.7,5.3],'decision':[np.nan]*10})
print tempDF
   condition  decision   x1   y1
0          0       NaN  1.2  6.5
1          0       NaN -2.3 -7.6
2          0       NaN -2.1 -3.4
3          0       NaN  2.4 -5.3
4          0       NaN -4.3  7.6
5          1       NaN  2.1  5.2
6          1       NaN -3.4 -4.1
7          1       NaN -4.1 -3.3
8          1       NaN  3.2 -5.7
9          1       NaN -3.3  5.3

Within each row, I want to change the value of the 'decision' column to zero if the 'condition' column equals zero and if 'x1' and 'y1' are both the same sign (either positive or negative) - for the purposes of this script zero is considered to be positive. If the signs of 'x1' and 'y1' are different or if the 'condition' column equals 1 (regardless of the signs of 'x1' and 'y1') then the 'decision' column should equal 1. I hope I've explained that clearly.

I can iterate over each row of the dataframe as follows:

for i in range(len(tempDF)):
    if (tempDF.ix[i,'condition'] == 0 and ((tempDF.ix[i,'x1'] >= 0) and (tempDF.ix[i,'y1'] >=0)) or ((tempDF.ix[i,'x1'] < 0) and (tempDF.ix[i,'y1'] < 0))):
        tempDF.ix[i,'decision'] = 0
    else:
        tempDF.ix[i,'decision'] = 1

print tempDF
           condition  decision   x1   y1
        0          0         0  1.2  6.5
        1          0         0 -2.3 -7.6
        2          0         0 -2.1 -3.4
        3          0         1  2.4 -5.3
        4          0         1 -4.3  7.6
        5          1         1  2.1  5.2
        6          1         1 -3.4 -4.1
        7          1         1 -4.1 -3.3
        8          1         1  3.2 -5.7
        9          1         1 -3.3  5.3

This produces the right output but it's a bit slow. The real dataframe I have is very large and these comparisons will need to be made many times. Is there a more efficient way to achieve the desired result?

jme · Accepted Answer · 2015-01-23 04:05:15Z

1

First, use np.sign and the comparison operators to create a boolean array which is True where the decision should be 1:

decision = df["condition"] | (np.sign(df["x1"]) != np.sign(df["y1"]))

Here I've used DeMorgan's laws.

Then cast to int and put it in the dataframe:

df["decision"] = decision.astype(int)

Giving:

>>> df
   condition  decision   x1   y1
0          0         0  1.2  6.5
1          0         0 -2.3 -7.6
2          0         0 -2.1 -3.4
3          0         1  2.4 -5.3
4          0         1 -4.3  7.6
5          1         1  2.1  5.2
6          1         1 -3.4 -4.1
7          1         1 -4.1 -3.3
8          1         1  3.2 -5.7
9          1         1 -3.3  5.3

edited Jan 23, 2015 at 4:05

answered Jan 23, 2015 at 3:56

jme

20.8k6 gold badges44 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user1718097 Over a year ago

Very minor point – I might use np.signbit() instead so that the zero values will be included with the positive values.

Collectives™ on Stack Overflow

Applying function to each row of pandas data frame - with speed

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related