1

Consider I have an array containing the string of datetime:

       new_time[index]
Out[9]: 
array(['2012-09-01_00:00:00', '2012-09-01_01:00:00',
    '2012-09-01_02:00:00', '2012-09-01_03:00:00',
    '2012-09-01_04:00:00', '2012-09-01_05:00:00',
    '2012-09-01_06:00:00', '2012-09-01_07:00:00',
    '2012-09-01_08:00:00', '2012-09-01_09:00:00',
    '2012-09-01_10:00:00', '2012-09-01_11:00:00',
    '2012-09-01_12:00:00', '2012-09-01_13:00:00',
    '2012-09-01_14:00:00', '2012-09-01_15:00:00',
    '2012-09-01_16:00:00', '2012-09-01_17:00:00',
    '2012-09-01_18:00:00', '2012-09-01_19:00:00',
    '2012-09-01_20:00:00', '2012-09-01_21:00:00',
    '2012-09-01_22:00:00', '2012-09-01_23:00:00'], dtype='<U19')

Its shape is (24,).The question is how can I assign it to a (24,19)array,and the rows of new array could look like following:

 ## one row of new array 
Out[10]: 
array([[b'2', b'0', b'1', b'2', b'-', b'0', b'9', b'-', b'0', b'1', b'_',
    b'0', b'0', b':', b'0', b'0', b':', b'0', b'0']], dtype='|S1')

Thanks for your help.

5
  • Probably with a view where dtype was uint8 or U1 or something Commented Mar 13, 2018 at 7:15
  • 1
    @MadPhysicist indeed X.view('U1').reshape(X.size, -1).astype('S1'). Commented Mar 13, 2018 at 7:17
  • @PaulPanzer why does dtype='U1' work, but dtype='S1' does not? Commented Mar 13, 2018 at 7:27
  • @MadPhysicist because the itemsizes don't match. If you view-cast from U* to S1 every character gets distributed across 4 bytes. To get from U to S you have to use something like astype, i.e. actually create a new data buffer with the unicode characters expressed as single bytes. Commented Mar 13, 2018 at 7:34
  • 1
    @MadPhysicist If you are ok with a non-contiguous result you could actually do X.view('S1').reshape(X.size, -1, 4)[..., 0] Commented Mar 13, 2018 at 7:37

4 Answers 4

2

For your array:

import numpy as np

a = np.array(['2012-09-01_00:00:00', '2012-09-01_01:00:00',
    '2012-09-01_02:00:00', '2012-09-01_03:00:00',
    '2012-09-01_04:00:00', '2012-09-01_05:00:00',
    '2012-09-01_06:00:00', '2012-09-01_07:00:00',
    '2012-09-01_08:00:00', '2012-09-01_09:00:00',
    '2012-09-01_10:00:00', '2012-09-01_11:00:00',
    '2012-09-01_12:00:00', '2012-09-01_13:00:00',
    '2012-09-01_14:00:00', '2012-09-01_15:00:00',
    '2012-09-01_16:00:00', '2012-09-01_17:00:00',
    '2012-09-01_18:00:00', '2012-09-01_19:00:00',
    '2012-09-01_20:00:00', '2012-09-01_21:00:00',
    '2012-09-01_22:00:00', '2012-09-01_23:00:00'], dtype='<U19')

You need to get to S1 and reshape:

>>> a.view('U1').astype('S1').reshape(a.size, -1)
array([[b'2', b'0', b'1', b'2', b'-', b'0', b'9', b'-', b'0', b'1', b'_',
        b'0', b'0', b':', b'0', b'0', b':', b'0', b'0'],
       [b'2', b'0', b'1', b'2', b'-', b'0', b'9', b'-', b'0', b'1', b'_',
        b'0', b'1', b':', b'0', b'0', b':', b'0', b'0'],
       ...
       [b'2', b'0', b'1', b'2', b'-', b'0', b'9', b'-', b'0', b'1', b'_',
        b'2', b'3', b':', b'0', b'0', b':', b'0', b'0']], 
      dtype='|S1')

Viewing directly as S1 does not work, because there are 4 bytes per charater:

>>> a.view('S1').shape
(1824,)
>>> a.view('U1').shape
(456,)

I you start with S19, you can view as S1 immediately:

>>> b.dtype
dtype('S19')
>>> b.view('S1').reshape(b.size, -1)
array([[b'2', b'0', b'1', b'2', b'-', b'0', b'9', b'-', b'0', b'1', b'_',
        b'0', b'0', b':', b'0', b'0', b':', b'0', b'0'],
       ...
       [b'2', b'0', b'1', b'2', b'-', b'0', b'9', b'-', b'0', b'1', b'_',
        b'2', b'3', b':', b'0', b'0', b':', b'0', b'0']], 
      dtype='|S1')
Sign up to request clarification or add additional context in comments.

1 Comment

Unicode is always good for a surprise. ;) Looks like there are 4 bytes per character.
1

If you are ok with a non-contiguous view you can simply do:

X.view('S1').reshape(X.size, -1, 4)[..., 0]

or

X.view('S1').reshape(X.size, -1)[:, ::4]

Since this shares data with the original array it is very cheap, but you have to be aware that modifying this in-place will also change the original array. Of course, you can always make a copy.

Comments

0

You can split your strings using list comprehension. Then you can get the 2D array using np.asarray() as

x = np.asarray(['2012-09-01_00:00:00', '2012-09-01_01:00:00',
    '2012-09-01_02:00:00', '2012-09-01_03:00:00',
    '2012-09-01_04:00:00', '2012-09-01_05:00:00',
    '2012-09-01_06:00:00', '2012-09-01_07:00:00',
    '2012-09-01_08:00:00', '2012-09-01_09:00:00',
    '2012-09-01_10:00:00', '2012-09-01_11:00:00',
    '2012-09-01_12:00:00', '2012-09-01_13:00:00',
    '2012-09-01_14:00:00', '2012-09-01_15:00:00',
    '2012-09-01_16:00:00', '2012-09-01_17:00:00',
    '2012-09-01_18:00:00', '2012-09-01_19:00:00',
    '2012-09-01_20:00:00', '2012-09-01_21:00:00',
    '2012-09-01_22:00:00', '2012-09-01_23:00:00'])

temp = []
for i in x:
    temp.append([j for j in i])
np.asarray(temp, dtype = 'S1')

Or in a very concise way you can do

temp = [[j for j in i] for i in x]   
temp = np.asarray(temp, dtype = 'S1')

6 Comments

Same time complexity as the other answer.
Not really. The other one is O(1) because it's making a view. You're copying the data, so O(n), and with a heavy load factor because you're using Python loops.
A view just creates another array structure with different metadata, but leaves the original underlying data untouched.
Well, nevwrmind, astype does make a copy. But there's a no-copy solution in the comments to the question.
Good call. Other solution is better. With an array size up to 1000 I still get the same execution time on my machine.
|
0

iterating through each value and then assigning it to a list, will solve this.

import numpy as np
array_24 = np.array(['2012-09-01_00:00:00', '2012-09-01_01:00:00',
    '2012-09-01_02:00:00', '2012-09-01_03:00:00',
    '2012-09-01_04:00:00', '2012-09-01_05:00:00',
    '2012-09-01_06:00:00', '2012-09-01_07:00:00',
    '2012-09-01_08:00:00', '2012-09-01_09:00:00',
    '2012-09-01_10:00:00', '2012-09-01_11:00:00',
    '2012-09-01_12:00:00', '2012-09-01_13:00:00',
    '2012-09-01_14:00:00', '2012-09-01_15:00:00',
    '2012-09-01_16:00:00', '2012-09-01_17:00:00',
    '2012-09-01_18:00:00', '2012-09-01_19:00:00',
    '2012-09-01_20:00:00', '2012-09-01_21:00:00',
    '2012-09-01_22:00:00', '2012-09-01_23:00:00'])
array_24.shape        #(24,)
array_24_19 = np.asarray([[j for j in i] for i in array_24])
array_24_19.shape     #(24, 19)
array_24_19[0]        #array(['2', '0', '1', '2', '-', '0', '9', '-','0','1', '_', '0', '0',':', '0', '0', ':', '0', '0'], dtype='|S1')

I hope this helps

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.