0

I am reading through a Numpy Tutorial and it is saying that a sample code like this:

>>> X = np.ones(10, dtype=np.int)
>>> Y = np.ones(10, dtype=np.int)
>>> A = 2*X + 2*Y

is slow because it creates three different intermediate arrays in order to hold the values of A, 2*X, and 2*Y.

Instead it is suggested that if speed is an issue perform the same calculation like this:

>>> X = np.ones(10, dtype=np.int)
>>> Y = np.ones(10, dtype=np.int)
>>> np.multiply(X, 2, out=X)
>>> np.multiply(Y, 2, out=Y)
>>> np.add(X, Y, out=X)

Yet I don't see where the speed difference would be. In the second code, X and Y still appear to be created as intermediate arrays. Is the difference rather in the speed of np.multiply instead of 2*X?

3
  • "In the second code, X and Y still appear to be created as intermediate arrays" - what? No, they're the inputs. The code reuses them to hold intermediate results (trashing the original data) rather than allocating new arrays. Commented Apr 11, 2017 at 20:25
  • 1
    See stackoverflow.com/questions/27293830/… about using out Commented Apr 11, 2017 at 20:28
  • @ThierryLathuille Gotcha, this makes sense now why the second code snippet can be useful. Commented Apr 11, 2017 at 20:30

2 Answers 2

2

I wrapped the two examples in functions, and tried some timings:

In [136]: timeit foo1(1000)
10000 loops, best of 3: 26.4 µs per loop
In [137]: timeit foo2(1000)
10000 loops, best of 3: 27.4 µs per loop

In [138]: timeit foo1(100000)
100 loops, best of 3: 2.39 ms per loop
In [139]: timeit foo2(100000)

1000 loops, best of 3: 1.24 ms per loop
In [140]: timeit foo1(10000000)
^[[A^[1 loop, best of 3: 571 ms per loop
In [141]: timeit foo2(10000000)
10 loops, best of 3: 175 ms per loop

For the smaller size, the use of outs doesn't make much difference. It's when the arrays get into the 10000 & up element size that we see benefits in array reuse. I suspect that with larger arrays the relative cost of allocating new ones is greater - harder to find reusable blocks, which then requires more calls to the OS, etc.

And the time savings are lost if I have to make copies of the 2 initial arrays first (to allow for their reuse)

 X = np.ones(N).copy()
 Y = np.ones(N).copy()

This is the kind of change to consider when you've gotten rid of the iterations. Even then SO answers are more likely to suggest numba or cython. I see it some in numpy functions, but it doesn't standout. The exception that comes to mind is np.cross (np.source(np.cross) ) which uses blocks like this:

        # cp0 = a1 * b2 - 0  (a2 = 0)
        # cp1 = 0 - a0 * b2  (a2 = 0)
        # cp2 = a0 * b1 - a1 * b0
        multiply(a1, b2, out=cp0)
        multiply(a0, b2, out=cp1)
        negative(cp1, out=cp1)
        multiply(a0, b1, out=cp2)
        cp2 -= a1 * b0
Sign up to request clarification or add additional context in comments.

Comments

2

Those two code examples aren't equal in terms of what they are doing. You're having to allocate a new array when you do 2*X. The second example, while faster, is a bit destructive, since you need to directly modify the array instead of making a copy for a calculation.

If you plan on reusing X and Y for several operations that do not depend on each other (that is, you multiply X and Y for this operation, but not a future one) you may want to use your initial approach so you don't have to undo operations.

1 Comment

Yes I see this now. So is the second way faster because it eliminates the time spent on creating a new array?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.