2

I am trying to find the number of days between a list of dates in a Pandas Dataframe and the current date.

I want to create a new column with the number of days between the due dates and the current date. For example:

+---------------------+------------+
|      Due Date       | Difference |
+---------------------+------------+
| 2019-04-15 00:00:00 |        146 |
| 2019-02-11 00:00:00 |         83 |
| 2019-03-11 00:00:00 |        111 |
| 2019-01-04 00:00:00 |         45 |
| 2019-05-13 00:00:00 |        174 |
+---------------------+------------+

I was trying to do:

current = np.datetime64('today')
df['Difference'] = df['Due Date'] - current

But I was getting wrong numbers. It works fine if I do them individually like:

df['Due Date'][0] - current

Any help would be great. Thanks!

1
  • You could look into pandas timedelta with .days Commented Nov 20, 2018 at 17:30

3 Answers 3

2

But I was getting wrong numbers.

Works fine for me on Pandas 0.23 / NumPy 1.14.3, assuming Due Date is a datetime series:

print(df['Due Date'] - np.datetime64('today'))

0   146 days
1    83 days
2   111 days
3    45 days
4   174 days
Name: Due Date, dtype: timedelta64[ns]

More idiomatic would be to use Pandas-generated objects and dt.days if you want integers:

print((df['Due Date'] - pd.Timestamp('today')).dt.days)

0    145
1     82
2    110
3     44
4    173
Name: Due Date, dtype: int64

Note, for example, the one-day differential caused by working with the NumPy version. The real answer is between the two, but rounding up is probably not what one expects.

Sign up to request clarification or add additional context in comments.

3 Comments

I guess it may not work in older versions. Can you post which version are you using ? If the OP puts his version maybe we can compare.
True , i'm sorry !
This worked for me: (df['Due Date'] - pd.Timestamp('today')).dt.days Thanks @jpp :D
2

I think you need to convert them to datetime to perform date-like operations.

df['Due Date'] = pd.to_datetime(df['Due Date'])

So the complete code would look like this:

df['Due Date'] = pd.to_datetime(df['Due Date'])
current = np.datetime64('today')
df['Difference'] = df['Due Date'] - current

EDIT: Also another possible issue: i think you would need to add the current date as a column (or a Panda Series) so a good fix will be:

current = np.datetime64('today')
df['current'] = np.datetime64('today')
df['Difference'] = df['Due Date'] - df['current']

2 Comments

The current dtype of the column is set to datetime64[ns]
Still getting the same error: 2019-04-15 18000 days 23:59:59.999982
0

I would check the format of df["Due Date"]. If it's not in the same date time format as todays date, then switch it. Broadcasting (the simple subtraction)should work if the formats match.

Otherwise, try using a lambda function to apply your changes:

df['Difference'] = df['Due Date'].apply(lambda x: x - current, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.