10

While implementing some Gauss-Seidel solver with Python and Numpy I discovered an interesting side effect. I tried to extract some minimal example:


    number = 1000
    startup = 'import numpy as np;N = 2048;X = np.arange(N * N).reshape((N, N));'
    startup2 = startup + 'np.empty_like(X)'


    example = ('X[1:: 2, : -1: 2] = ('
               'X[: -1: 2, 1:: 2] +'
               'X[1:: 2, : -1: 2] +'
               'X[1:: 2, : -1: 2] +'
               'X[: -1: 2, : -1: 2])/4')

running print(timeit.timeit(example, setup=startup, number=number)) takes at my machine ~5s

while print(timeit.timeit(example, setup=startup2, number=number)) takes ~4s.

So around ~1s faster, although there is this unnecessary array allocation withnp.emtpy_like(X). I observed this effect on various machines and with various array sizes or iterations.

I assume that calculation of the right hand side in the assignment leads to an temporally array allocation. It seems that Numpy somehow reuses the unused array that was created with thenp.emtpy_like(X) to speed up the temporally array allocation.

Am I right with this assumption or is something totally different the reason for the differences in the time?

If I remove the /4 to

 example = ('X[1:: 2, : -1: 2] = ('
               'X[: -1: 2, 1:: 2] +'
               'X[1:: 2, : -1: 2] +'
               'X[1:: 2, : -1: 2] +'
               'X[: -1: 2, : -1: 2])')

Then, I can not observe differences in the execution time between the different versions. So I assume that in this case the calculation can be done in-place and then there is no temporally allocation.

Is there a more explicit way to exploit this effect? Just writing np.emtpy_like(X)` looks somehow "hacky" to me.

Thanks in advance!

Updates:

Thank you for your answers and comments so far.

Finally I found some more time to mess around with my observations and I am not sure if I my explanations above are clear enough. So my initial observation was, that this

    number = 1000
    N = 1024
    X = np.arange(N * N).reshape((N, N))
    np.empty_like(X)
    for _ in range(number):
        X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] +
                             X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4

was faster then this

    number = 1000
    N = 1024
    X = np.arange(N * N).reshape((N, N))
    for _ in range(number):
        X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] +
                             X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4

Which I found quiet surprising, because this totally unused and unnecessary array allocation with the np.empty_like(X) seems to accelerate the loop below. Thereby, it does not matter if I use np.empty_like , zeros_like, ones_like, ones(X.shape) or X.copy() as long there is a unused array allocated. And it also occurs for different N's, number of iterations and on different machines.

Furthermore I tried to investigate the issue with the tracemalloc package:

   number = 1000
   N = 1024   
   X = np.arange(N * N).reshape((N, N))
   tracemalloc.start()
   np.empty_like(X)
   for _ in range(number):
       X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] +
                            X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4
   s2 = tracemalloc.take_snapshot()

   display_top(s2)

Where display_top is the method from the Documentation, except that I print out in Bytes not in KB.

When I run it without the extra array allocation from np.empty_like(X) I get some output like this:

Top 10 lines
#1: ./minexample.py:40: 160.0 B
    X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] + X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4
#2: ./minexample.py:39: 28.0 B
    for _ in range(1000):
Total allocated size: 188.0 B

With the extra allocation I get this:

Top 10 lines
#1: ./minexample.py:40: 128.0 B
    X[1:: 2, : -1: 2] = (X[: -1: 2, 1:: 2] + X[1:: 2, : -1: 2] + X[1:: 2, : -1: 2] + X[: -1: 2, : -1: 2]) / 4
#2: ./minexample.py:38: 32.0 B
    np.empty_like(X)
#3: ./minexample.py:39: 28.0 B
    for _ in range(1000):
Total allocated size: 188.0 B

So the size that is allocated of the line in the loop is smaller then when there is no unused array preallocated. So it seems to me that this unused array is reused.

Maybe this explains this effect?

4
  • Very interesting question! I gave it some thoughts and I'm really interested in an explanation why this happens. Can you lift out the import from the first startup statement and see if that has any effect? Commented Oct 23, 2020 at 18:37
  • 1
    Went along expected lines for me - %timeit N = 2048;X = np.arange(N * N).reshape((N, N)); 6.95 ms ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)... %timeit N = 2048;X = np.arange(N * N).reshape((N, N)); np.empty_like(X); 6.97 ms ± 19.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each). OP's case with import numpy as np seemed to be the unwanted bad apple with timings. @MarcSances Commented Oct 24, 2020 at 13:45
  • I can't reproduce the result consistently. Meanwhile, in most runs the difference between the two measurements is less than 5%. The order of two timeit statements also makes a difference. Commented Oct 26, 2020 at 9:49
  • %timeit N=2048; X = np.arange(N * N).reshape((N, N)); and %timeit X = np.arange(N * N).reshape((N, N)); X=None; is also different in time (17.4 ms ± 226 µs vs. 12.2 ms ± 114 µs)… Commented Oct 31, 2020 at 4:38

1 Answer 1

3
+50

empty_like(X) initializes an array (or matrix) with the same dimensions as X and fills it with random values, which is an incredibly fast process. But keep in mind that + empty_like(X) does not give you the desired result ( you would need zeros_like(X) )!

Why it's fast to add two arrays in numpy has to do with densely packed arrays and is described here: https://stackoverflow.com/a/8385658/14344821, this can potentially be faster than creating the matrix X with explicitly mentioned entries. Tips on how to create numpy arrays efficiently can be found here.

Sign up to request clarification or add additional context in comments.

1 Comment

"fills it with random values" No, it doesn't fill it with anything at all. Generating random values would be slow. It leaves the memory unchanged from when it was last used. "This function does not initialize the returned array"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.