0

Let's say, that we have a numpy array storing large objects. My goal is to delete one of these objects from memory, but retain the initial structure of the array. The cell, under which this object was stored might be filled for example with None.

Example simplified behaviour, where I replaced large objects with characters:

arr = numpy.asarray(['a', 'b', 'c']) # arr = ['a', 'b', 'c']
delete_in_place(arr, 0)              # arr = [None, 'b', 'c']

I can't do this by calling numpy.delete(), because it will just return a new array without one element, which will take additional space in memory. This will also change the shape (by getting rid of given index), which I want to avoid.

My other idea was to just set arr[0] = None and call the garbage collector, but I'm not sure what the exact behaviour of such procedure would be.

Do you have any ideas on how to do it?

6
  • Whats your end objective? "premature optimization is the root of all evil" Commented Oct 29, 2022 at 15:32
  • What do you mean by large objects? In your example, the elements are strings, and the resulting dtype will be 'U1'. If you try that None assignment you'll get 'N' in that cell. If it is an object dtype array, then the None can replace the referenced objects. If those objects are not longer referenced, then yes, they will be garbage. Commented Oct 29, 2022 at 15:47
  • @tijko In general my array is a 2D one. I'm combining objects stored at different indices, and sometimes I want to store the result of such "combination" in place of one of the old objects, while the second one might be removed from the memory. I want to retain the structure of the array, so that I can properly handle the indices. Commented Oct 29, 2022 at 15:48
  • @hpaulj My objects can take several GB of memory and I just replaced them with simple strings for the purpose of the example. Commented Oct 29, 2022 at 15:49
  • 1
    As I pointed out, simple strings are stored in arrays differently. You don't need to give us GB objects, but you still need to capture the core of the issue in your example(s). Commented Oct 29, 2022 at 15:50

2 Answers 2

5

When you create a numpy array, it has a fixed size. Eventually, when you try to delete an element it will create a new numpy array.

The way you are trying to do it, that's not an effective way. Please try another library.

Sign up to request clarification or add additional context in comments.

3 Comments

Can this be achieved for example with basic python's lists?
@brzepkowski absolutely but as far as memory access you'd have to dive into the C-API
@tijko well said. +1
0

You can do this with a multi-dimensional array and not even get pandas or numpy involved. You will need the assistance of the gc module and builtin del command but thats the extent of things.

For example:

import gc

with open('large-dataset.txt') as fh:
    raw_data = fh.readlines()

# parse or object creation what-not
large_objs_multidim = [obj_create(i) for i in raw_data]
...
# No longer need a reference to large object
temp_obj = large_objs_multidim[0][0]
large_objs_multidim[0][0] = None
del temp_obj
# Python doesn't make guarantees about collection read up on ref-counts.
gc.collect()

This gives the general idea on how you need to invoke the garbage collector yourself. There are some nuisances to Python's reference counting and objects in memory. I don't know the intricacies to your project and code but you might benefit from reading into __weakref__ too...

Also these link for further reading:

https://stackoverflow.com/a/1316793/1230086

https://stackoverflow.com/a/9908216/1230086

https://docs.python.org/3/library/gc.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.