I have a csv file that contains the date column. The dates are in the format "1stNov", "22ndNov" etc. How do I programatically convert these to date time in Pandas?
2 Answers
This is not a Pandas-specific problem but a problem in string/pattern matching. Use Python's strftime directives. Note also you must provide a year: datetime objects internally must hold a year. So with regular Python:
from datetime import datetime
L = ["1stNov", "22ndNov"]
res = [datetime.strptime(i[:-5] + i[-3:] + '2018', '%d%b%Y') for i in L]
[datetime.datetime(2018, 11, 1, 0, 0),
datetime.datetime(2018, 11, 22, 0, 0)]
Or with Pandas:
s = pd.Series(L)
res = pd.to_datetime(s.str[:-5] + s.str[-3:] + '2018', format='%d%b%Y')
0 2018-11-01
1 2018-11-22
dtype: datetime64[ns]
Comments
Use list comprehension with parser:
df = pd.DataFrame({'dates':["1stNov", "22ndNov"]})
from dateutil import parser
df['dates'] = [parser.parse(f'{x[:-3]} {x[-3:]} 2018') for x in df['dates']]
Or remove st, nd strings and convert to_datetime:
#python 3.6+ solution with f-strings
dates = [(f'{x[:-5]}{x[-3:]}2018') for x in df['dates']]
#python bellow 3.6
#dates = ['{}{}2018'.format(x[:-5], x[-3:]) for x in df['dates']]
df['dates'] = pd.to_datetime(dates, format='%d%b%Y')
print (df)
dates
0 2018-11-01
1 2018-11-22
pd.read_csv(filepath, parse_dates = True)