3

I'm reading in an excel file and unioning it to a csv file.

When I read in the excel file I have a field of dates:

0    2018-05-28 00:00:00
1    9999-12-31 00:00:00
2    2018-02-26 00:00:00
3    2018-02-26 00:00:00
4    2018-02-26 00:00:00
Name: Date_started, dtype: object

I check the data type

df['Date_started'].dtype
dtype('O')

Then when I write out the resultant dataframe to csv I get this:

df.to_csv(folderpath + "Date_Started_df.csv",encoding="UTF-8" , index=False, na_rep='',date_format='%d%m%Y')
Date_Started

28/05/2018 00:00
31/12/9999 00:00
26/02/2018 00:00
26/02/2018 00:00
26/02/2018 00:00

I have tried

df.loc[:,'Date_Started'] = df['Date_Started'].astype('str').str[8:10] + "/" + 
df['Date_Started'].astype('str').str[5:7] + "/" + 
df['Date_Started'].astype('str').str[:4] 

Which gave me:

0    28/05/2018
1    31/12/9999
2    26/02/2018
3    26/02/2018
4    26/02/2018
Name: Date_started, dtype: object

I thought it might be in the writing out:

df.to_csv(filename, date_format='%Y%m%d')

but I still got the times!?

1 Answer 1

1

You need to convert your series to datetime before sending to CSV:

df['Date_Started'] = pd.to_datetime(df['Date_Started'])

This then allows Pandas to execute date_format='%d%m%Y' for the appropriate column with to_csv. The to_csv docs make this explicit:

date_format : string, default None

Format string for datetime objects

Sign up to request clarification or add additional context in comments.

7 Comments

my problem is that I've got fake dates in the list (e.g. 31/12/9999) giving an error "ValueError: Given date string not likely a datetime."
Yeh, that's a problem, one tip is to convert to a non-possible date, e.g. df['Date_Started'] = pd.to_datetime(df['Date_Started'], errors='coerce').fillna(pd.to_datetime('1950-01-01')). Works if you know 1st Jan 1950 is not a valid date.
would that leave blanks in the non-possible dates? As I actually need to be able to identify them as they are records without start dates
I was wondering if it was a writing issue as I managed to create "Strings", but times were added when it was written to csv
To leave blanks, use df['Date_Started'] = pd.to_datetime(df['Date_Started'], errors='coerce'). Blanks will be NaT, also known as "Not a Time". Try it and see what happens.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.