Converting 32-bit integer into array of four 8-bit integers in Python

Question

How to efficiently convert a 32-bit integer into an array of four 8-bit integers in Python?

Currently I have the following code, which is super slow:

def convert(int32_val):
    bin = np.binary_repr(int32_val, width = 32) 
    int8_arr = [int(bin[0:8],2), int(bin[8:16],2), 
                int(bin[16:24],2), int(bin[24:32],2)]
    return int8_arr

E.g:

print convert(1)
>>> [0, 0, 0, 1]   

print convert(-1)
>>> [255, 255, 255, 255]

print convert(-1306918380)  
>>> [178, 26, 2, 20]

I need to achieve the same behavior on unsigned 32-bit integers.

Additionally. Is it possible to vectorize it for a large numpy array of 32bit integers?

dtype allows you to view an array in 2 different ways. There's an example on docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html of doing this. np.dtype((np.int16, {'x':(np.int8,0), 'y':(np.int8,1)})) — hpaulj
– hpaulj, Commented Aug 14, 2014 at 1:24
Do you have a numpy array of 32 bit integers? Given an example of the actual input you want to handle. — Warren Weckesser
– Warren Weckesser, Commented Aug 14, 2014 at 2:19
I just made this suggestion in a comment to @hpaulj's answer. If x is the array, you can use y = x.view(np.uint8).reshape(x.shape + (4,)). — Warren Weckesser
– Warren Weckesser, Commented Aug 14, 2014 at 2:30
@WarrenWeckesser On the little endian platforms that almost everyone uses, the view will give him the opposite ordering of the bytes (1 -> [1, 0, 0, 0]), so a [:, ::-1] may be needed after reshaping the view. But yes, that one-liner is definitely the way to go about this. — Jaime
– Jaime, Commented Aug 14, 2014 at 4:28

hpaulj · Accepted Answer · 2014-08-14 02:00:29Z

Using dtype as documented in: http://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html

Subdivide int16 into 2 int8‘s, called x and y. 0 and 1 are the offsets in bytes:

np.dtype((np.int16, {'x':(np.int8,0), 'y':(np.int8,1)}))
dtype(('<i2', [('x', '|i1'), ('y', '|i1')]))

Or adapted to your case:

In [30]: x=np.arange(12,dtype=np.int32)*1000
In [39]: dt=np.dtype((np.int32, {'f0':(np.uint8,0),'f1':(np.uint8,1),'f2':(np.uint8,2), 'f3':(np.uint8,3)}))

In [40]: x1=x.view(dtype=dt)

In [41]: x1['f0']
Out[41]: array([  0, 232, 208, 184, 160, 136, 112,  88,  64,  40,  16, 248], dtype=uint8)

In [42]: x1['f1']
Out[42]: array([ 0,  3,  7, 11, 15, 19, 23, 27, 31, 35, 39, 42], dtype=uint8)

compare

In [38]: x%256
Out[38]: array([  0, 232, 208, 184, 160, 136, 112,  88,  64,  40,  16, 248])

More documentation on http://docs.scipy.org/doc/numpy/user/basics.rec.html

2) Tuple argument: The only relevant tuple case that applies to record structures is when a structure is mapped to an existing data type. This is done by pairing in a tuple, the existing data type with a matching dtype definition (using any of the variants being described here). As an example (using a definition using a list, so see 3) for further details):

x = np.zeros(3, dtype=('i4',[('r','u1'), ('g','u1'), ('b','u1'), ('a','u1')]))

array([0, 0, 0])

x['r'] # array([0, 0, 0], dtype=uint8)

In this case, an array is produced that looks and acts like a simple int32 array, but also has definitions for fields that use only one byte of the int32 (a bit like Fortran equivalencing).

One way to get a 2d array of the 4 bytes is:

In [46]: np.array([x1['f0'],x1['f1'],x1['f2'],x1['f3']])
Out[46]: 
array([[  0, 232, 208, 184, 160, 136, 112,  88,  64,  40,  16, 248],
       [  0,   3,   7,  11,  15,  19,  23,  27,  31,  35,  39,  42],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0]], dtype=uint8)

Same idea, but more compact:

In [50]: dt1=np.dtype(('i4', [('bytes','u1',4)]))

In [53]: x2=x.view(dtype=dt1)

In [54]: x2.dtype
Out[54]: dtype([('bytes', 'u1', (4,))])

In [55]: x2['bytes']
Out[55]: 
array([[  0,   0,   0,   0],
       [232,   3,   0,   0],
       [208,   7,   0,   0],
       [184,  11,   0,   0],
       [160,  15,   0,   0],
       [136,  19,   0,   0],
       [112,  23,   0,   0],
       [ 88,  27,   0,   0],
       [ 64,  31,   0,   0],
       [ 40,  35,   0,   0],
       [ 16,  39,   0,   0],
       [248,  42,   0,   0]], dtype=uint8)

In [56]: x2
Out[56]: 
array([    0,  1000,  2000,  3000,  4000,  5000,  6000,  7000,  8000,
        9000, 10000, 11000])

Creating a view with a dtype is definitely the right approach. If x is a contiguous array of np.int32, it can be as simple as y = x.view(np.uint8).reshape(x.shape + (4,)).

user1129665 · Accepted Answer · 2014-08-14 02:12:06Z

3

In Python 3.2 and higher, there is a new int method to_bytes which can also be used:

>>> convert = lambda n : [int(i) for i in n.to_bytes(4, byteorder='big', signed=True)]
>>>
>>> convert(1)
[0, 0, 0, 1]
>>>
>>> convert(-1)
[255, 255, 255, 255]
>>>
>>> convert(-1306918380)
[178, 26, 2, 20]
>>>

answered Aug 14, 2014 at 2:12

user1129665

Comments

Xin Yin · Accepted Answer · 2014-08-14 17:55:21Z

3

You can either use bitwise operations:

def int32_to_int8(n):
    mask = (1 << 8) - 1
    return [(n >> k) & mask for k in range(0, 32, 8)]

>>> int32_to_int8(32768)
[0, 128, 0, 0]

Or alternatively you can use struct package in Python:

>>> import struct
>>> int32 = struct.pack("I", 32768)
>>> struct.unpack("B" * 4, int32)

(0, 128, 0, 0)

One nice thing you can exploit about struct package is that you can perform this int32 to int8 very efficiently:

import numpy.random

# Generate some random int32 numbers
x = numpy.random.randint(0, (1 << 31) - 1, 1000)

# Then you can convert all of them to int8 with just one command
x_int8 = struct.unpack('B' * (4*len(x)), buffer(x))

# To verify that the results are valid:
x[0]
Out[29]: 1219620060

int32_to_int8(x[0])
Out[30]: [220, 236, 177, 72]

x_int8[:4]
Out[31]: (220, 236, 177, 72)

# And it's FAST!

%timeit struct.unpack('B' * (4*len(x)), buffer(x))
10000 loops, best of 3: 32 µs per loop

%timeit [int32_to_int8(i) for i in x]
100 loops, best of 3: 6.01 ms per loop

UPDATE: Compare `struct.unpack` with `ndarray.view`:

import numpy as np

# this is fast because it only creates the view, without involving any creation
# of objects in Python
%timeit x.view(np.int8)
1000000 loops, best of 3: 570 ns per loop

If you were to perform some actual computation:

uint8_type = "B" * len(x) * 4
%timeit sum(struct.unpack(uint8_type, buffer(x)))
10000 loops, best of 3: 52.6 µs per loop

# slow because in order to call sum(), implicitly the view object is converted to
# list.
%timeit sum(x.view(np.int8))
1000 loops, best of 3: 768 µs per loop

# use the numpy.sum() function - without creating Python objects
%timeit np.sum(x.view(np.int8))
100000 loops, best of 3: 8.55 µs per loop # <- FAST!

Take home message: use ndarray.view()!

edited Aug 14, 2014 at 17:55

answered Aug 14, 2014 at 1:46

Xin Yin

2,97623 silver badges20 bronze badges

2 Comments

ojy Over a year ago

Thanks, it's definitely way faster than my initial approach. Though much slower than the view-based method.

Xin Yin Over a year ago

That's true. Because view-based method doesn't generate new data; it simply uses a different type of pointer to enumerate. struct.unpack on the other hand, needs to create a whole bunch of Python objects and that can kill the performance. I've learnt something new thanks to your question.

Andrew Johnson · Accepted Answer · 2014-08-14 01:48:09Z

2

Just using python builtin division and modulus provides a 6x speedup in my tests.

def convert(i):
    i = i % 4294967296
    n4 = i % 256
    i = i / 256
    n3 = i % 256
    i = i / 256
    n2 = i % 256
    n1 = i / 256
    return (n1,n2,n3,n4)

edited Aug 14, 2014 at 1:48

answered Aug 14, 2014 at 1:39

Andrew Johnson

3,2061 gold badge20 silver badges26 bronze badges

1 Comment

mgear Over a year ago

Works great! Added optional big/little-endian flag and function to convert back from bytearray to in here github.com/unitycoder/PythonPlayground/blob/master/Conversion/…

Collectives™ on Stack Overflow

Converting 32-bit integer into array of four 8-bit integers in Python

4 Answers 4

1 Comment

Comments

UPDATE: Compare `struct.unpack` with `ndarray.view`:

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

UPDATE: Compare struct.unpack with ndarray.view:

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related

UPDATE: Compare `struct.unpack` with `ndarray.view`: