Reduce loop time in python

Question

Loops in python taking alot time to give result.This contains around 100k records.

It is taking lot of time. How time can be reduced

df['loan_agr'] = df['loan_agr'].astype(int)

for i in range(len(df)):

    if df.loc[i,'order_mt']== df.loc[i,'enr_mt']:

        df['new_N_Loan'] = 1

        df['exist_N_Loan'] = 0

        df['new_V_Loan'] = df['loan_agr']

        df['exist_V_Loan'] = 0

    else:        

        df['new_N_Loan'] = 0

        df['exist_N_Loan'] = 1

        df['new_V_Loan'] = 0

        df['exist_V_Loan'] = df['loan_agr']

Can someone improve the question by mentioning what library this is about? And maybe add a tag as well. — Griffon26
– Griffon26, Commented Dec 23, 2018 at 18:24

Henry · Accepted Answer · 2018-12-23 21:32:48Z

5

You can use loc and set the new values in a vectorized way. This approach is much faster than using iteration because these operations are performed on entire columns at once, rather than individual values. Check out this article for more on speed optimization in pandas.

For example:

mask = df['order_mt'] == df['enr_mt']
df.loc[mask, ['new_N_Loan', 'exist_N_Loan', 'exist_V_Loan']] = [1, 0, 0]
df.loc[mask, ['new_V_Loan']] = df['loan_agr']

df.loc[~mask, ['new_N_Loan', 'exist_N_Loan', 'new_V_Loan']] = [0, 1, 0]
df.loc[~mask, ['exist_V_Loan']] = df['loan_agr']

Edit:

If the ~ (bitwise not) operator is not supported in your version of pandas, you can make a new mask for the "else" condition, similar to the first condition.

For example:

mask = df['order_mt'] == df['enr_mt']
else_mask = df['order_mt'] != df['enr_mt']

Then use the else_mask for the second set of definitions instead of ~mask.

Sample:

Input:

   order_mt  enr_mt new_N_Loan exist_N_Loan exist_V_Loan new_V_Loan  loan_agr
0         1       1       None         None         None       None       100
1         2       2       None         None         None       None       200
2         3      30       None         None         None       None       300
3         4      40       None         None         None       None       400

Output:

   order_mt  enr_mt  new_N_Loan  exist_N_Loan  exist_V_Loan  new_V_Loan  loan_agr
0         1       1           1             0             0         100       100
1         2       2           1             0             0         200       200
2         3      30           0             1           300           0       300
3         4      40           0             1           400           0       400

edited Dec 23, 2018 at 21:32

answered Dec 23, 2018 at 18:13

Henry

15.9k7 gold badges51 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Amit Amola Over a year ago

One of the most helpful answers I've seen on stack recently, thanks for sharing @Henry and the link had even more good tricks. I do have a question though, do you have any idea what is so different about this mask approach; what I mean to ask is, wouldn't it internally be iterating only, so how come this is faster than a usual pythonic loop? Is this also related to the integration of C with Python in the background? Any idea?

yashul Over a year ago

Hi Henry, Thanks for the response. ~ does not works there as i am working on order version of pandas on server. Also i could not update pandas lib.

It_is_Chris Over a year ago

@AmitAmola it is faster because loc can use a boolean array for boolean indexing. loc does not "loop" through the dataframe it returns a view of the dataframe based on location based indexing.

Henry Over a year ago

@yashul I see, I've updated my answer, let me know if that helps

yashul Over a year ago

Hi Henry, Thank you so much. It is super super fast. Previously i was using if loop as well with your snippet. this method rocked :).

|

Rinzler786 · Accepted Answer · 2018-12-23 19:05:19Z

0

Instead of range(Len(...)) you could change the len function to a value.

answered Dec 23, 2018 at 19:05

Rinzler786

991 silver badge11 bronze badges

1 Comment

Rinzler786 Over a year ago

Comment on this answer if you need a different answer

Collectives™ on Stack Overflow

Reduce loop time in python

2 Answers 2

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related