Consider two ways of naively making the same bytearray (using Python 2.7.11, but confirmed same behavior in 3.4.3 as well):
In [80]: from array import array
In [81]: import numpy as np
In [82]: a1 = array('L', [1, 3, 2, 5, 4])
In [83]: a2 = np.asarray([1,3,2,5,4], dtype=int)
In [84]: b1 = bytearray(a1)
In [85]: b2 = bytearray(a2)
Since both array.array and numpy.ndarray support the buffer protocol, I would expect both to export the same underlying data on conversion to bytearray.
But the data from above:
In [86]: b1
Out[86]: bytearray(b'\x01\x03\x02\x05\x04')
In [87]: b2
Out[87]: bytearray(b'\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00')
At first I thought maybe a naive call to bytearray on a NumPy array will inadvertently get some extra bytes due to data type, contiguity, or some other overhead data.
But even when looking at the NumPy buffer data handle directly, it still says size is 40 and gives the same data:
In [90]: a2.data
Out[90]: <read-write buffer for 0x7fb85d60fee0, size 40, offset 0 at 0x7fb85d668fb0>
In [91]: bytearray(a2.data)
Out[91]: bytearray(b'\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00')
The same failing happens with a2.view():
In [93]: bytearray(a2.view())
Out[93]: bytearray(b'\x01\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00')
I noted that if I gave dtype=np.int32 then the length of bytearray(a2) is 20 instead of 40, suggesting that the extra bytes have to do with type information -- it's just not clear why or how:
In [20]: a2 = np.asarray([1,3,2,5,4], dtype=int)
In [21]: len(bytearray(a2.data))
Out[21]: 40
In [22]: a2 = np.asarray([1,3,2,5,4], dtype=np.int32)
In [23]: len(bytearray(a2.data))
Out[23]: 20
AFAICT, np.int32 ought to correspond to the array 'L' typecode, but any explanations about why not would be massively helpful.
How can one reliably extract only the part of the data that "should" be exported via the buffer protocol ... as in, the same as what the plain array data looks like in this case.