0

I'm loading an Excel file with pandas using the parse_dates=True parameter. However, the date format can not change. When I open the file in Excel on my local computer, the date format is accurate, but when the file is loaded in Python, the format is incorrect. The issue is that "dmY acts like mdY for half of the data." I'm expecting the output with YYYY-mm-dd HH:MM:SS. Below is the code for the data

df = pd.DataFrame({'ID':[2914240.0,2914137.0,2929456.0,2920801.0],
                   'RT_Date':['2021-01-02 06:38:00','2021-01-02 02:58:00','02-19-2021 17:59:00','2021-09-02 12:49:00'],
                   'DateCreated':['01-31-2021 14:38:12','01-31-2021 10:57:50','02-19-2021 01:58:38','2021-09-02 11:35:51']})

enter image description here

The code which i tried is below

df = pd.read_excel('demo.xlsx', parse_dates=True)

But the above code is not working and returning the same output as shown in the image. Is there any issue with the data/pandas? Can anyone please help me with this?

1 Answer 1

2

Using pd.to_datetime() allows for the use of the infer_datetime_format argument which eases working with datetime values. In this particular case, if you'd like to parse all columns except for ID you can try:

df['RT_Date'],df['DateCreated']  = [pd.to_datetime(df[x],infer_datetime_format=True) for x in df if x != 'ID']

Based on your example, it outputs:

          ID             RT_Date         DateCreated
0  2914240.0 2021-01-02 06:38:00 2021-01-31 14:38:12
1  2914137.0 2021-01-02 02:58:00 2021-01-31 10:57:50
2  2929456.0 2021-02-19 17:59:00 2021-02-19 01:58:38
3  2920801.0 2021-09-02 12:49:00 2021-09-02 11:35:51

Should you want only the date:

df['RT_Date'],df['DateCreated']  = [pd.to_datetime(df[x],infer_datetime_format=True).dt.date for x in df if x != 'ID']

Returning:

          ID     RT_Date DateCreated
0  2914240.0  2021-01-02  2021-01-31
1  2914137.0  2021-01-02  2021-01-31
2  2929456.0  2021-02-19  2021-02-19
3  2920801.0  2021-09-02  2021-09-02

EDIT: If there are multiple columns you wish to evaluate this for and no easy way, then you can create a list of candidate columns:

to_date_cols = ['RT_Date','DateCreated','Additional_Col1']

And use:

for x in to_date_cols:
   df[x] = pd.to_datetime(df[x],infer_datetime_format=True)
Sign up to request clarification or add additional context in comments.

2 Comments

If there are multiple columns, can i rewrite in this way df['RT_Date'], df['DateCreated'] = [pd.to_datetime(df[x], infer_datetime_format=True) for x in df if x == 'RT_Date', 'DateCreated']
Check the edit please

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.