0

I am able to convert a numpy-array column of type pandas timestamp to an int array:

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': [pd.datetime(2019, 1, 11, 5, 30, 1), pd.datetime(2019, 1, 11, 5, 30, 1), pd.datetime(2019, 1, 11, 5, 30, 1)], 'b': [np.nan, 5.1, 1.6]})

a = df.to_numpy()
a
# array([[Timestamp('2019-01-11 05:30:01'), nan],
#       [Timestamp('2019-01-11 05:30:01'), 5.1],
#       [Timestamp('2019-01-11 05:30:01'), 1.6]], dtype=object)
a[:,0] = a[:,0].astype('datetime64').astype(np.int64)
# array([[1547184601000000, nan],
#        [1547184601000000, 5.1],
#        [1547184601000000, 1.6]], dtype=object)

For this array a, I would like to convert the column 0 back to a pandas timestamp. As the array is quite big and my overall process quite time consuming, I would like to avoid the usage of python loops, applys, lambdas or similar things. Instead, I am looking for speed optimized native numpy based functions etc.

I tried already things like:

a[:,0].astype('datetime64')

(result: ValueError: Converting an integer to a NumPy datetime requires a specified unit)

and:

import calendar
calendar.timegm(a[:,0].utctimetuple())

(result: AttributeError: 'numpy.ndarray' object has no attribute 'utctimetuple')

How can I convert my column a[:,0] back to

array([[Timestamp('2019-01-11 05:30:01'), nan],
      [Timestamp('2019-01-11 05:30:01'), 5.1],
      [Timestamp('2019-01-11 05:30:01'), 1.6]], dtype=object)

in a speed optimized way?

2
  • What do you mean "back to", I can't see the difference between your original data with the desired output? Commented Aug 15, 2019 at 3:04
  • I mean with 'back to' that I get from the column with ints ('1547184601000000' etc.) back to the Timestamps ('2019-01-11 05:30:01') Commented Aug 15, 2019 at 3:08

1 Answer 1

1

Let's review docs

Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.

So, we can use DatetimeIndex. and then covert it by using np.int64.

In [18]: b = a[:,0]                                                             

In [19]: index = pd.DatetimeIndex(b)

In [21]: index.astype(np.int64)                                                 
Out[21]: Int64Index([1547184601000000000, 1547184601000000000, 1547184601000000000], dtype='int64')
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, that was very helpful. You have to multipy by 1000 though, to get the right result: pd.DatetimeIndex(a[:,0]*1e3)
can be boxed to - what does that mean?
@wwii To explain your question, I think you should check the link pandas.pydata.org/pandas-docs/version/0.25/reference/api/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.