1

I have a dataframe with an id column, and a date column made up of an integer.

d = {'id': [1, 2], 'date': [20161031, 20170930]}
df = pd.DataFrame(data=d)

   id      date
0   1  20161031
1   2  20170930

I can convert the date column to an actual date like so.

df['date'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))

   id       date
0   1 2016-10-31
1   2 2017-09-30

But I need to have this field as a timestamp with with hours, minutes, and seconds so that it is compatible with my database table. I don't care what the the values are, we can keep it easy by setting it to zeros.

2016-10-31 00:00:00
2017-09-30 00:00:00

What is the best way to change this field to a timestamp? I tried

df['date'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d%H%M%S'))

but pandas didn't like that.

I think I could append six 0's to the end of every value in that field and then use the above statement, but I was wondering if there is a better way.

4
  • use pandas.to_datetime(df['date']) that is enough, already discussed at stackoverflow.com/questions/32204631/… Commented Aug 19, 2020 at 18:43
  • format : string, default None strftime to parse time, eg “%d/%m/%Y”, note that “%f” will parse all the way up to nanoseconds. Commented Aug 19, 2020 at 18:47
  • @Serge are you sure it works in this case though? E.g. pd.to_datetime(20161031) == pd.Timestamp('1970-01-01 00:00:00.020161031') Commented Aug 19, 2020 at 19:33
  • You are right, it should be converted to string with df['date'].apply(str) Commented Aug 19, 2020 at 20:03

3 Answers 3

3

With pandas it is simpler and faster to convert entire columns. First you convert to string and then to time stamp

pandas.to_datatime(df['date'].apply(str))

PS there are few other conversion methods of varying performance https://datatofish.com/fastest-way-to-convert-integers-to-strings-in-pandas-dataframe/

Sign up to request clarification or add additional context in comments.

Comments

2

The problem seems to be that pd.to_datetime doesn't accept dates in this integer format:

pd.to_datetime(20161031) gives Timestamp('1970-01-01 00:00:00.020161031')

It assumes the integers are nanoseconds since 1970-01-01.

You have to convert to a string first:

df['date'] = pd.to_datetime(df["date"].astype(str))

Output:

   id       date
0   1 2016-10-31
1   2 2017-09-30

Note that these are datetimes so they include a time component (which are all zero in this case) even though they are not shown in the data frame representation above.

print(df.loc[0,'date'])

Out:

Timestamp('2016-10-31 00:00:00')

Comments

1

You can use

df['date'] = pd.to_datetime(df["date"].dt.strftime('%Y%m%d%H%M%S'))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.