I have a historical collection of ~ 500k loans, some of which have defaulted, others have not. My dataframe is lcd_temp. lcd_temp has information on the loan size (loan_amnt), if loan has defaulted or not (Total Defaults), annual loan rate (clean_rate),term of loan (clean_term), and months from origination to default (mos_to_default). mos_to_default is equal to clean_term if no default.
I would like to calculate the Cumulative Cashflow [cum_cf] for each loan as the sum of all coupons paid until default plus (1-severity) if loan defaults, and simply the loan_amnt if it pays back on time.
Here's my code, which takes an awful long time to run:
severity = 1
for i in range (0,len(lcd_temp['Total_Defaults'])-1):
if (lcd_temp.loc[i,'Total_Defaults'] ==1):
# Default, pay coupon only until time of default, plus (1-severity)
lcd_temp.loc[i,'cum_cf'] = ((lcd_temp.loc[i,'mos_to_default'] /12) * lcd_temp.loc[i,'clean_rate'])+(1 severity)*lcd_temp.loc[i,'loan_amnt']
else:
# Total cf is sum of coupons (non compounded) + principal
lcd_temp.loc[i,'cum_cf'] = (1+lcd_temp.loc[i,'clean_term']/12* lcd_temp.loc[i,'clean_rate'])*lcd_temp.loc[i,'loan_amnt']
Any thoughts or suggestions on improving the speed (which takes over an hour so far) welcomed!
mask = lcd_temp.loc[..., 'Total_Defaults'] == 1.[pandas]tag is not appropriate?