Is numpy reusing memory from unused arrays?

Question

While implementing some Gauss-Seidel solver with Python and Numpy I discovered an interesting side effect. I tried to extract some minimal example:


    number = 1000
    startup = 'import numpy as np;N = 2048;X = np.arange(N * N).reshape((N, N));'
    startup2 = startup + 'np.empty_like(X)'


    example = ('X[1:: 2, : -1: 2] = ('
               'X[: -1: 2, 1:: 2] +'
               'X[1:: 2, : -1: 2] +'
               'X[1:: 2, : -1: 2] +'
               'X[: -1: 2, : -1: 2])/4')

running print(timeit.timeit(example, setup=startup, number=number)) takes at my machine ~5s

while print(timeit.timeit(example, setup=startup2, number=number)) takes ~4s.

So around ~1s faster, although there is this unnecessary array allocation withnp.emtpy_like(X). I observed this effect on various machines and with various array sizes or iterations.

I assume that calculation of the right hand side in the assignment leads to an temporally array allocation. It seems that Numpy somehow reuses the unused array that was created with thenp.emtpy_like(X) to speed up the temporally array allocation.

Am I right with this assumption or is something totally different the reason for the differences in the time?

If I remove the /4 to

 example = ('X[1:: 2, : -1: 2] = ('
               'X[: -1: 2, 1:: 2] +'
               'X[1:: 2, : -1: 2] +'
               'X[1:: 2, : -1: 2] +'
               'X[: -1: 2, : -1: 2])')

Then, I can not observe differences in the execution time between the different versions. So I assume that in this case the calculation can be done in-place and then there is no temporally allocation.

Is there a more explicit way to exploit this effect? Just writing np.emtpy_like(X)` looks somehow "hacky" to me.

Thanks in advance!

Updates:

Thank you for your answers and comments so far.

Finally I found some more time to mess around with my observations and I am not sure if I my explanations above are clear enough. So my initial observation was, that this

    number = 1000
    N = 1024
    X = np.arange(N * N).reshape((N, N))
    np.empty_like(X)
    for _ in range(number):
        X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] +
                             X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4

was faster then this

    number = 1000
    N = 1024
    X = np.arange(N * N).reshape((N, N))
    for _ in range(number):
        X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] +
                             X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4

Which I found quiet surprising, because this totally unused and unnecessary array allocation with the np.empty_like(X) seems to accelerate the loop below. Thereby, it does not matter if I use np.empty_like , zeros_like, ones_like, ones(X.shape) or X.copy() as long there is a unused array allocated. And it also occurs for different N's, number of iterations and on different machines.

Furthermore I tried to investigate the issue with the tracemalloc package:

   number = 1000
   N = 1024   
   X = np.arange(N * N).reshape((N, N))
   tracemalloc.start()
   np.empty_like(X)
   for _ in range(number):
       X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] +
                            X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4
   s2 = tracemalloc.take_snapshot()

   display_top(s2)

Where display_top is the method from the Documentation, except that I print out in Bytes not in KB.

When I run it without the extra array allocation from np.empty_like(X) I get some output like this:

Top 10 lines
#1: ./minexample.py:40: 160.0 B
    X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] + X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4
#2: ./minexample.py:39: 28.0 B
    for _ in range(1000):
Total allocated size: 188.0 B

With the extra allocation I get this:

Top 10 lines
#1: ./minexample.py:40: 128.0 B
    X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] + X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4
#2: ./minexample.py:38: 32.0 B
    np.empty_like(X)
#3: ./minexample.py:39: 28.0 B
    for _ in range(1000):
Total allocated size: 188.0 B

So the size that is allocated of the line in the loop is smaller then when there is no unused array preallocated. So it seems to me that this unused array is reused.

Maybe this explains this effect?

Very interesting question! I gave it some thoughts and I'm really interested in an explanation why this happens. Can you lift out the import from the first startup statement and see if that has any effect? — Marc Sances
– Marc Sances, Commented Oct 23, 2020 at 18:37
Went along expected lines for me - %timeit N = 2048;X = np.arange(N * N).reshape((N, N)); 6.95 ms ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)... %timeit N = 2048;X = np.arange(N * N).reshape((N, N)); np.empty_like(X); 6.97 ms ± 19.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each). OP's case with import numpy as np seemed to be the unwanted bad apple with timings. @MarcSances — Divakar
– Divakar, Commented Oct 24, 2020 at 13:45
I can't reproduce the result consistently. Meanwhile, in most runs the difference between the two measurements is less than 5%. The order of two timeit statements also makes a difference. — GZ0
– GZ0, Commented Oct 26, 2020 at 9:49
%timeit N=2048; X = np.arange(N * N).reshape((N, N)); and %timeit X = np.arange(N * N).reshape((N, N)); X=None; is also different in time (17.4 ms ± 226 µs vs. 12.2 ms ± 114 µs)… — Heliton Martins
– Heliton Martins, Commented Oct 31, 2020 at 4:38

Casper Dijkstra · Accepted Answer · 2020-10-31 13:18:08Z

3

+50

empty_like(X) initializes an array (or matrix) with the same dimensions as X and fills it with random values, which is an incredibly fast process. But keep in mind that + empty_like(X) does not give you the desired result ( you would need zeros_like(X) )!

Why it's fast to add two arrays in numpy has to do with densely packed arrays and is described here: https://stackoverflow.com/a/8385658/14344821, this can potentially be faster than creating the matrix X with explicitly mentioned entries. Tips on how to create numpy arrays efficiently can be found here.

edited Oct 31, 2020 at 13:18

answered Oct 31, 2020 at 12:54

Casper Dijkstra

1,9132 gold badges18 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

endolith Over a year ago

"fills it with random values" No, it doesn't fill it with anything at all. Generating random values would be slow. It leaves the memory unchanged from when it was last used. "This function does not initialize the returned array"

Collectives™ on Stack Overflow

Is numpy reusing memory from unused arrays?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related