4

I am using python to build an OpenGL rendering engine and am using numpy arrays with a custom datatype to store my vertex data.

import numpy as np

data_type_vertex = np.dtype({
    "names": ["x", "y", "z" , "color"],
    "formats": [np.float32, np.float32, np.float32, np.uint32],
    "offsets": [0, 4, 8, 12],
    "itemsize": 16
})

When loading in vertex data from a .obj file, it's useful to temporarily store the vertex data in a regular python list before converting that data to a Numpy array with my custom datatype. However, simply trying to convert the list to a python array gives unexpected results.

vertex_list = [ 
    [1.1, 2.2, 3.3, 5],
    [4.4, 5.5, 6.6, 7]
]

print(np.array(vertex_list, dtype=data_type_vertex))

# Result
# [[(1.1, 1.1, 1.1, 1) (2.2, 2.2, 2.2, 2) (3.3, 3.3, 3.3, 3)
#   (5. , 5. , 5. , 5)]
# [(4.4, 4.4, 4.4, 4) (5.5, 5.5, 5.5, 5) (6.6, 6.6, 6.6, 6)
#  (7. , 7. , 7. , 7)]]

As can be seen, each element of the list is converted to a full instance of the custom datatype by coping the element to all fields, instead of the intended behaviour of converting the sublists to instances of the custom datatypes. This can be solved initializing a placeholder array and iteratively converting all list elements.

vertex_array = np.zeros(len(vertex_list), dtype=data_type_vertex)
for i, v in enumerate(vertex_list):
    vertex_array[i] = (v[0], v[1], v[2], v[3])
    
print(vertex_array)

# Result 
# [(1.1, 2.2, 3.3, 5) (4.4, 5.5, 6.6, 7)]

While this works, it feels somewhat clunky and might require a lot of hardcoded conversion functions if multiple custom datatypes are introduced.

Is there a better way to achieve the same result?

New contributor
Tom van G is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
1
  • Can you describe the format of the .obj file? Maybe the list step could be skipped. Commented 2 days ago

1 Answer 1

3

The array is broadcasted so each item of the input array is broadcasted to all fields of the output array items. It is not very clear to me why Numpy does that. That being said, Numpy expect a list of tuple in this case so to do the conversion properly. As a result, a simple solution is to convert each item to a tuple before calling np.array:

result = np.array(list(map(tuple, vertex_list)), dtype=data_type_vertex)
# result = array([(1.1, 2.2, 3.3, 5), (4.4, 5.5, 6.6, 7)]

To reduce the memory footprint and possibly also improve performance, you can call np.fromiter with a generator in parameter:

result = np.fromiter(map(tuple, vertex_list), dtype=data_type_vertex)

Please note that you can add count=len(vertex_list) in argument to np.fromiter for possibly better performance (though it does not generally make a huge difference). This should be faster than an explicit pure-python loop.


An alternative solution is to convert the whole list to a 32-bit float Numpy array and then copy+convert columns to the final array. This should be faster than the above solution here. However, this should not be used as a generic solution since 32-bit float can introduce a loss of precision for large integers. It is only fine here if the color field contains small integers (i.e. <=4096). If you want to store 24-bit or even 32-bit integers, then you can convert the list to 64-bit floating-point numbers and then copy it in the final array so not to loose any precision (at the expense of a bit slower conversion -- still much faster than the initial code).

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the clear and informative answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.