1

Loops in python taking alot time to give result.This contains around 100k records.

It is taking lot of time. How time can be reduced

df['loan_agr'] = df['loan_agr'].astype(int)

for i in range(len(df)):

    if df.loc[i,'order_mt']== df.loc[i,'enr_mt']:

        df['new_N_Loan'] = 1

        df['exist_N_Loan'] = 0

        df['new_V_Loan'] = df['loan_agr']

        df['exist_V_Loan'] = 0

    else:        

        df['new_N_Loan'] = 0

        df['exist_N_Loan'] = 1

        df['new_V_Loan'] = 0

        df['exist_V_Loan'] = df['loan_agr']
2
  • can you post a short slice of input dataframe? Commented Dec 23, 2018 at 18:06
  • Can someone improve the question by mentioning what library this is about? And maybe add a tag as well. Commented Dec 23, 2018 at 18:24

2 Answers 2

5

You can use loc and set the new values in a vectorized way. This approach is much faster than using iteration because these operations are performed on entire columns at once, rather than individual values. Check out this article for more on speed optimization in pandas.

For example:

mask = df['order_mt'] == df['enr_mt']
df.loc[mask, ['new_N_Loan', 'exist_N_Loan', 'exist_V_Loan']] = [1, 0, 0]
df.loc[mask, ['new_V_Loan']] = df['loan_agr']

df.loc[~mask, ['new_N_Loan', 'exist_N_Loan', 'new_V_Loan']] = [0, 1, 0]
df.loc[~mask, ['exist_V_Loan']] = df['loan_agr']

Edit:

If the ~ (bitwise not) operator is not supported in your version of pandas, you can make a new mask for the "else" condition, similar to the first condition.

For example:

mask = df['order_mt'] == df['enr_mt']
else_mask = df['order_mt'] != df['enr_mt']

Then use the else_mask for the second set of definitions instead of ~mask.

Sample:

Input:

   order_mt  enr_mt new_N_Loan exist_N_Loan exist_V_Loan new_V_Loan  loan_agr
0         1       1       None         None         None       None       100
1         2       2       None         None         None       None       200
2         3      30       None         None         None       None       300
3         4      40       None         None         None       None       400

Output:

   order_mt  enr_mt  new_N_Loan  exist_N_Loan  exist_V_Loan  new_V_Loan  loan_agr
0         1       1           1             0             0         100       100
1         2       2           1             0             0         200       200
2         3      30           0             1           300           0       300
3         4      40           0             1           400           0       400
Sign up to request clarification or add additional context in comments.

7 Comments

One of the most helpful answers I've seen on stack recently, thanks for sharing @Henry and the link had even more good tricks. I do have a question though, do you have any idea what is so different about this mask approach; what I mean to ask is, wouldn't it internally be iterating only, so how come this is faster than a usual pythonic loop? Is this also related to the integration of C with Python in the background? Any idea?
Hi Henry, Thanks for the response. ~ does not works there as i am working on order version of pandas on server. Also i could not update pandas lib.
@AmitAmola it is faster because loc can use a boolean array for boolean indexing. loc does not "loop" through the dataframe it returns a view of the dataframe based on location based indexing.
@yashul I see, I've updated my answer, let me know if that helps
Hi Henry, Thank you so much. It is super super fast. Previously i was using if loop as well with your snippet. this method rocked :).
|
0

Instead of range(Len(...)) you could change the len function to a value.

1 Comment

Comment on this answer if you need a different answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.