I have a dataframe with multiple columns.There's column named 'remaining_lease' which has 75% Nan. I don't want to drop that column. So I want to calculate the 'remaining_lease' using two other columns, 'lease_commense_date' and 'current_year'. Formula for that is:
remaining_lease = 99 - ( current_year - lease_commense_date)
for eg: current_year = 2022 and lease_commense_date = 1979
then remaining_lease = 99 - (2022 - 1979) = 56
I have written a function in order to do so.
def remaining_lease_year(x, current_year, commense_year):
if math.isnan(x): # if the value is nan
lease_year = 99 - (current_year - commense_year)
return lease_year
else: #if the value is not nan
return x
df['remaining_lease'] = df['remaining_lease'].apply(lambda x: remaining_lease_year(x, df['current_year'], df['lease_commence_date']))
But I am getting an error:
MemoryError: Unable to allocate 7.08 MiB for an array with shape (927465,) and data type int64
Is there any other way to achieve it?