I have a DataFrame that include customer ticket using time.
ticket_end data is not correct and I have to use ticket_start column which is correct and I have customer ticket_name which describe how long are the tickets.
I used relativedelta(months=+numberofmonths) which is working but I have 300k rows and time is more than 2 hours so I started to find other options but all same then I tried this code again it only took 5 mins! I did not changed again but I do not know what happened but I had to start kernel again and now it is taking more than 2 hours again.
My question is I do not know why it happened? and What can we do for making datetime column process faster?
Here is my code:
for i in tqdm(range(len(customer))):
if customer.ticket_name[i] == '3 month free':
customer.ticket_end[i] = customer.ticket_start[i] + relativedelta(months=+1)
elif customer.product_name[i] == '4 month free':
customer.ticket_end[i] = customer.ticket_start[i] + relativedelta(months=+4)
elif customer.product_name[i] == '6 month free':
customer.ticket_end[i] = customer.ticket_start[i] + relativedelta(months=+6)
elif customer.product_name[i] == '9 month free':
customer.ticket_end[i] = customer.ticket_start[i] + relativedelta(months=+9)
else:
customer.ticket_end[i] = customer.ticket_start[i] + relativedelta(months=+1)
before the code, the date columns was string and date and time '2015-01-28 17:59:50'
I do not needed so I removed the time with this:
customer['ticket_start']= pd.to_datetime(customer['ticket_start'],format='%Y-%m-%d %H:%M:%S')
customer['ticket_start'] = map(lambda x: x.date(), customer['ticket_start'])
again pd.to_datetime():
customer['ticket_start']= pd.to_datetime(customer['ticket_start'])
might be critical information I got data both from csv and from a database with mysql.connector
but now both are a process 2 hours.
Thanks in advance.