Efficiently coercing a Python list to a Numpy array with a custom datatype

Question

I am using python to build an OpenGL rendering engine and am using numpy arrays with a custom datatype to store my vertex data.

import numpy as np

data_type_vertex = np.dtype({
    "names": ["x", "y", "z" , "color"],
    "formats": [np.float32, np.float32, np.float32, np.uint32],
    "offsets": [0, 4, 8, 12],
    "itemsize": 16
})

When loading in vertex data from a .obj file, it's useful to temporarily store the vertex data in a regular python list before converting that data to a Numpy array with my custom datatype. However, simply trying to convert the list to a python array gives unexpected results.

vertex_list = [ 
    [1.1, 2.2, 3.3, 5],
    [4.4, 5.5, 6.6, 7]
]

print(np.array(vertex_list, dtype=data_type_vertex))

# Result
# [[(1.1, 1.1, 1.1, 1) (2.2, 2.2, 2.2, 2) (3.3, 3.3, 3.3, 3)
#   (5. , 5. , 5. , 5)]
# [(4.4, 4.4, 4.4, 4) (5.5, 5.5, 5.5, 5) (6.6, 6.6, 6.6, 6)
#  (7. , 7. , 7. , 7)]]

As can be seen, each element of the list is converted to a full instance of the custom datatype by coping the element to all fields, instead of the intended behaviour of converting the sublists to instances of the custom datatypes. This can be solved initializing a placeholder array and iteratively converting all list elements.

vertex_array = np.zeros(len(vertex_list), dtype=data_type_vertex)
for i, v in enumerate(vertex_list):
    vertex_array[i] = (v[0], v[1], v[2], v[3])
    
print(vertex_array)

# Result 
# [(1.1, 2.2, 3.3, 5) (4.4, 5.5, 6.6, 7)]

While this works, it feels somewhat clunky and might require a lot of hardcoded conversion functions if multiple custom datatypes are introduced.

Is there a better way to achieve the same result?

Can you describe the format of the .obj file? Maybe the list step could be skipped. — Reinderien
– Reinderien, Commented 2 days ago

Jérôme Richard · Accepted Answer · 2025-11-27 19:21:25Z

The array is broadcasted so each item of the input array is broadcasted to all fields of the output array items. It is not very clear to me why Numpy does that. That being said, Numpy expect a list of tuple in this case so to do the conversion properly. As a result, a simple solution is to convert each item to a tuple before calling np.array:

result = np.array(list(map(tuple, vertex_list)), dtype=data_type_vertex)
# result = array([(1.1, 2.2, 3.3, 5), (4.4, 5.5, 6.6, 7)]

To reduce the memory footprint and possibly also improve performance, you can call np.fromiter with a generator in parameter:

result = np.fromiter(map(tuple, vertex_list), dtype=data_type_vertex)

Please note that you can add count=len(vertex_list) in argument to np.fromiter for possibly better performance (though it does not generally make a huge difference). This should be faster than an explicit pure-python loop.

An alternative solution is to convert the whole list to a 32-bit float Numpy array and then copy+convert columns to the final array. This should be faster than the above solution here. However, this should not be used as a generic solution since 32-bit float can introduce a loss of precision for large integers. It is only fine here if the color field contains small integers (i.e. <=4096). If you want to store 24-bit or even 32-bit integers, then you can convert the list to 64-bit floating-point numbers and then copy it in the final array so not to loose any precision (at the expense of a bit slower conversion -- still much faster than the initial code).

Collectives™ on Stack Overflow

Efficiently coercing a Python list to a Numpy array with a custom datatype

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related