2

I have the following list of tuples,

a=[('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0),
   ('1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107),
   ('1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883),
   ('1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837)]

What I want to do is to create a numpy nd.array with shape (4,8) That looks like this:

array([['1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
       ['1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107],
       ['1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883],
       ['1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837]])

I tried the following code:

import numpy as np
x = np.array(a, dtype=('a10,a10,f4,f4,f4,f4,f4,f4'))

But it gives this shape instead:

In [37]: x.shape
Out[37]: (4,)

What's the right way to do it?

1 Answer 1

1

What you have already done is by far the most logical way to do it. To achieve what you are asking for you need to create an object array:

z = np.array(a,dtype=np.object)
print z.shape
# (4, 8)

What is looks like you are asking for is an array with variable data type by column. This is exactly what you achieve with np.array(a, dtype=('a10,a10,f4,f4,f4,f4,f4,f4')). Internally you can think of this array like an array of structs in C, i.e. a 1-d array of dtype=('a10,a10,f4,f4,f4,f4,f4,f4') instances.

By using an object array you can request that numpy handles everything as a simple python object.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks a lot. Is there any harm using object instead of ('a10,a10,f4,f4,f4,f4,f4,f4') ?
I would assume that using object would be inherently slower due to the its variable nature. The "harm" here would be somewhat unpredictable behaviour, as it will not always be obvious what bit width numbers have. e.g. half your ints could be 64-bit and the other half 32-bit.
Is there safer alternative than 'object' which produces the same results?
The safer alternative is your original example, which creates a recarray. What is the problem with using that? Is there really a requirement that you have a (4,8) shape? The recarray can be indexed just like a normal 2-d array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.