1

I have a big data set that I need to date operation, and as it is taking too long, I was wondering if there is any other way to boost up the speed. Data frame looks like following:

Date, Month
2017-01-01, 0
2017-01-01, 1
2017-01-01, 2

I need to create another column that adds month column to date column, so it would look like following:

Date, Month, newDate
2017-01-01, 0, 2017-01-01
2017-01-01, 1, 2017-02-01
2017-01-01, 2, 2017-03-01

My current method is using apply function and relativedelta method like:

def newDateCalc(self, row):
return row[0] + relativedelta(months = row[1])

df['newDate'] = df[['Date', 'Month']].apply(lambda row: newDateCalc(row), axis = 1)

Thank you for your help in advance,

2 Answers 2

1

Here is my vectorized attempt:

df['newDate'] = (df.Date.values.astype('M8[M]') + 
                 df.Month.values * np.timedelta64(1, 'M')).astype('M8[D]')

Result:

In [106]: df
Out[106]:
        Date  Month    newDate
0 2017-01-01      0 2017-01-01
1 2017-01-01      1 2017-02-01
2 2017-01-01      2 2017-03-01
Sign up to request clarification or add additional context in comments.

1 Comment

@Hojin, glad I could help :)
1

You can use df.transform with relativedelta:

In [960]: df.transform(lambda x: x['Date'] + relativedelta(months=x['Month']), axis=1)
Out[960]: 
0   2017-01-01
1   2017-02-01
2   2017-03-01
dtype: datetime64[ns]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.