2

I am reading an excel files :

dfEx=pd.read_excel(fileExcel, engine='openpyxl')

There is a date column in Object format, and dfEx["Date"] gives me these results :

id date
0 15/10/2022 10:44:59
1 15/10/2022 10:47:53
2 15/10/2022 10:49:50
3 15/10/2022 10:50:40
4 15/10/2022 10:54:05
...
174 2022-10-16 00:00:00
175 2022-10-16 00:00:00
176 2022-10-16 00:00:00
177 2022-10-16 00:00:00
178 2022-10-16 00:00:00

I don't get why I have the format change between the beginning and the end, as in the Excel file the separator character is always / for year/month/day

The syntax I have in the Excel file is :

15/10/2022 11:13:64

for the first and

15-10-2022

for the rest

So when I use the datetime function it doesn't work :

dfEx['Date'] =  pd.to_datetime(dfEx['Date'], format='%d/%m/%Y %H:%M:%S', exact=False)

ValueError: time data '2022-10-15 00:00:00' does not match format '%d/%m/%Y %H:%M:%S' (search)

1
  • A workaround could be to read that column from excel as a str and convert it in a second step. Commented Oct 18, 2022 at 8:00

2 Answers 2

2

Add errors='coerce' for missing values if no match format 15/10/2022 10:44:59 and then replace missing values by another to_datetime for convert format 2022-10-16 00:00:00:

dfEx['date'] = (pd.to_datetime(dfEx['date'], format='%d/%m/%Y %H:%M:%S', errors='coerce')
                  .fillna(pd.to_datetime(dfEx['date'], errors='coerce')))
print (dfEx)
   id                 date
0    0 2022-10-15 10:44:59
1    1 2022-10-15 10:47:53
2    2 2022-10-15 10:49:50
3    3 2022-10-15 10:50:40
4    4 2022-10-15 10:54:05
5  174 2022-10-16 00:00:00
6  175 2022-10-16 00:00:00
7  176 2022-10-16 00:00:00
8  177 2022-10-16 00:00:00
9  178 2022-10-16 00:00:00
Sign up to request clarification or add additional context in comments.

Comments

0

I see that you have two different dates formats in excel:

  • Ones with dashes 2022-10-16 00:00:00
  • Others with slashes 15/10/2022 10:54:05

So you cannot parse all the dates at once, you first have to split the dataframes:

df_slashes = df[df.Date.str.contains("/", regex=False))
df_dashes = df[df.Date.str.contains("-", regex=False))

Then for each dataframe run the corresponding to_datetime function:

df_slashes_dates = pd.to_datetime(df_slashes.Date, format='%d/%m/%Y %H:%M:%S')
df_dashes_dates = pd.to_datetime(df_dashes.Date, format='%d-%m-%Y %H:%M:%S')

Finally do whatever you want with these two series.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.