1

I have a numpy array which is composed of numpy.datetime64 values. I'd like to convert these to pandas Timestamps using pandas.Timestamp().

I could do an explicit for-loop like

import numpy as np
import pandas as pd
stamps = [pd.Timestamp(t) for t in my_arr]

but this isn't very efficient. I can use numpy's vectorize function to do something like this instead

stamper = np.vectorize(pd.Timestamp)
stamps = stamper(my_arr)

but the numpy documentation states that vectorize is mostly a convenience function and not intended for performance. Is there a better, more efficient way to do this?

Edit: Here are some timings from some of the solutions given:

%timeit stamper(my_arr)
%timeit my_arr.astype(pd.Timestamp)
%timeit np.array([pd.Timestamp(t) for t in my_arr])
%timeit pd.to_datetime(my_arr)

100 loops, best of 3: 7.04 ms per loop
10000 loops, best of 3: 82 µs per loop
100 loops, best of 3: 16.8 ms per loop
1000 loops, best of 3: 1.19 ms per loop

Seems that the .astype() is fastest, so I'll go with this. Thanks!

10
  • Won't pd.DataFrame(my_arr).to_timestamp() do what you want? Commented Sep 4, 2015 at 17:10
  • Ed, it doesn't seem to. When I tried this I got " 'Int64Index' object has no attribute 'to_timestamp' " Commented Sep 4, 2015 at 17:14
  • Sorry try pd.DataFrame(my_arr).to_timestamp(axis=1) Commented Sep 4, 2015 at 17:15
  • Same problem. I'm using pandas 0.13.1, if that makes a difference. Commented Sep 4, 2015 at 17:18
  • Is there a reason you specifically need TimeStamp? I think that if you just constructed a df from the np array the dtype will be preserved as datetime64 is that not enough? Commented Sep 4, 2015 at 17:20

2 Answers 2

2

If my_arr is a numpy ndarray, I would suggest doing :

my_arr.astype(pd.Timestamp)

That would create a copy of the array and cast it to the type you want.

Sign up to request clarification or add additional context in comments.

Comments

1

I think you can just use the vectorized function pd.to_datetime().

Suppose your datetime string is not the standard ISO-format

my_arr = np.array(['8/28/2015 13:46', '8/27/2015 13:26', '8/27/2015 11:46'])
my_arr

array(['8/28/2015 13:46', '8/27/2015 13:26', '8/27/2015 11:46'], 
      dtype='<U15')

Call the vectorized function pd.to_datetime() with customized format argument

dts = pd.to_datetime(my_arr, format='%m/%d/%Y %H:%M')
dts

DatetimeIndex(['2015-08-28 13:46:00', '2015-08-27 13:26:00',
               '2015-08-27 11:46:00'],
              dtype='datetime64[ns]', freq=None, tz=None)

You can calculate difference between different timestamps and total seconds

dts[0] - dts[-1]

Timedelta('1 days 02:00:00')

(dts[0] - dts[-1]).total_seconds()

93600.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.