13

In my dataframe, the time is separated in 3 columns: year, month, day, like this: enter image description here

How can I convert them into date, so I can do time series analysis?

I can do this:

df.apply(lambda x:'%s %s %s' % (x['year'],x['month'], x['day']),axis=1)

which gives:

1095       1954 1 1
1096       1954 1 2
1097       1954 1 3
1098       1954 1 4
1099       1954 1 5
1100       1954 1 6
1101       1954 1 7
1102       1954 1 8
1103       1954 1 9
1104      1954 1 10
1105      1954 1 11
1106      1954 1 12
1107      1954 1 13

But what follows?

EDIT: This is what I end up with:

from datetime import datetime
df['date']= df.apply(lambda x:datetime.strptime("{0} {1} {2}".format(x['year'],x['month'], x['day']), "%Y %m %d"),axis=1)
df.index= df['date']

3 Answers 3

9

Here's how to convert value to time:

import datetime


df.apply(lambda x:datetime.strptime("{0} {1} {2} 00:00:00".format(x['year'],x['month'], x['day']), "%Y %m %d %H:%M:%S"),axis=1)
Sign up to request clarification or add additional context in comments.

6 Comments

It gives me NameError: name 'x' is not defined, what the problem?
It was missing your lambda expression variable used from the dataframe. i still feel like this is forced but let me know if it works.
I think maybe we should convert into datetime, not time?
Change to datetime. Better?
this works for me: df.apply(lambda x: pandas.datetime.strptime("{0} {1} {2} 00:00:00".format(x['year'],x['month'], x['day']), "%Y %m %d %H:%M:%S"),axis=1)
|
4

It makes no sense to format a date to a string and immediately reparse it; use the datetime constructor instead:

df.apply(lambda x: datetime.date(x['year'], x['month'], x['day']), axis=1)

Comments

2

There is a simpler and much faster way to convert 3 columns with year, month and day to a single datetime column in pandas:

import pandas

pandas.to_datetime(df)

Besides the code being much simpler than the accepted answer, in my computer your implementation takes 22.3 seconds, while this one takes 175 milliseconds, with a 1 million rows dataframe. This implementation is 127x faster.

Note that in your case, the columns are already named year, month and day, which is a requirement for the input dateframe of to_datetime. If they have different names, you need to rename them first (e.g. df.rename(columns={'<your_year_col>': 'year', ...})).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.