1

To add a field to a structured numpy array, it is quite simply to create a new array with a new dtype, copy over the old fields, and add the new field. However, I need to do this for an array that takes a lot of memory, and I would rather not duplicate all of it. Both my own implementation and the (slow) implementation in numpy.lib.recfunctions.append_fields duplicate memory.

Is there a way to add a field to a structured ndarray, without duplicating memory? That means, either a way that avoids creating a new ndarray, or a way to create a new ndarray that points to the same data as the old?

Solutions that do duplicate RAM:

There is a similar question where the challenge is to remove, not add, fields. The solution uses a view, which should work for a subset of the original data, but I'm not sure if it can be amended when I rather want to add fields.

1
  • If your array is a view on a buffer of which the last half is not used, you might be able to allocate the extra fields in the last half (rather than adjacent to their existing row). Commented Oct 11, 2016 at 0:09

1 Answer 1

3

A structured array is stored, like a regular one, as a contiguous buffer of bytes, one record following the previous. The records are, thus, a bit like the last dimension of a multidimensional array. You can't add a column to a 2d array without making a new array via concatenation.

Adding a field, say I4 dtype to dtype that is, say, 20 bytes long, means changing the record (element) length to 24, i.e. adding 4 bytes to the buffer every 20th byte. numpy can't do that without making a new data buffer and copying values from the old (and the new).

Actually even if we were talking about adding a new record to the array, i.e. concatenating on a new array, it would still require creating a new data buffer. Arrays are fixed sized.

Fields in a structured array are not like objects in a list or a dictionary. You can't add a field by just adding a pointer to an object elsewhere in memory.

Maybe you should be using a dictionary, with item being an array. Then you can freely add a key/item without copying the existing ones. But then access by 'rows' will be slow.

Sign up to request clarification or add additional context in comments.

3 Comments

Hmm, ok. I need a different approach then. Perhaps I could cut the big array in N pieces, adding fields to the smaller pieces one at a time, so that I still copy everything, but not all at once, thus limiting peak memory usage.
Still, it should be possible to do it in a memory friendly way without duplicating data. It could copy data to a new array and grow the new array size while simultaneously decreasing the old array size.
Each 'grow' and 'decrease' requires a data copy. Don't worry about 'memory friendly' unless it is really hurting execution times or you get 'memory error' problems. But don't conflate those two problems.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.