2

I have these info:

a = array([[ 1,  1,  1,  1,  1],
           [ 1,  1,  1,  1,  2],
           [ 1,  1,  1,  1,  3],
           [ 1,  1,  1, 18, 16],
           [ 1,  1,  1, 18, 17]], dtype=int16)
b = np.arange(0,50).reshape(5,10)
change_cols = [1,2,5,7,9]

I would like to change every row of b at columns defined by change_cols with the values of a[:,:,-1] to get:

b = array([[ 0,  1,  1,  3,  4,  1,  6,  1,  8,  1],
           [10,  2,  1, 13, 14,  1, 16,  1, 18,  1],
           [20,  3,  1, 23, 24,  1, 26,  1, 28,  1],
           [30, 16, 18, 33, 34,  1, 36,  1, 38,  1],
           [40, 17, 18, 43, 44,  1, 46,  1, 48,  1]])

Presently I am doing this:

for n, i in enumerate(change_cols):
    b[:,i] = a[:,-(n+1)]

How do I do this in NumPy efficiently w/o using Python's for-loop?

Update:

I tried to compare the time taken to complete the answers by @ogdencave and @U12-forward against python's for-loop_enumerate. Surprisingly, Python3 for-loop+enumerate is fastest. Why is this?

>>> def splice():
    b[:, change_cols] = a[:, ::-1]

    
>>> def splice_arange():
    b[:, change_cols] = a[:,-(np.arange(len(change_cols)) + 1)]

    
>>> def enumerate_for_loop():
    for n, i in enumerate(change_cols):
        b[:,i] = a[:,-(n+1)]

        
>>> timeit.timeit('splice()', number=10000, globals=globals())
0.042480306001380086
>>> timeit.timeit('splice_arange()', number=10000, globals=globals())
0.05964866199065
>>> timeit.timeit('enumerate_for_loop()', number=10000, globals=globals())
0.03969518095254898
>>>

I also tried array sizes close to my real scenario. I am surprise that python's for-loop+enumerate approach is the fastest.

>>> a = np.array([ 1,  1,  1, 18, 17]*300).reshape(300,5)
>>> b = np.arange(0,300*200).reshape(300,200)
>>> for i in range(5):
    timeit.timeit('splice()', number=10000, globals=globals())
0.04873670096276328
0.04880331002641469
0.055170061998069286
0.04291973798535764
0.031961234053596854

>>> for i in range(5):
    timeit.timeit('splice_arange()', number=10000, globals=globals())
0.07321989600313827
0.07536661700578406
0.06798515404807404
0.07559602102264762
0.07348818198079243

>>> for i in range(5):
    timeit.timeit('enumerate_for_loop()', number=10000, globals=globals())
0.054252722999081016
0.03883319004671648
0.036229485005605966
0.036062364000827074
0.03962253499776125
6
  • 1
    Testing with small inputs is not a very good metric for measuring performance unless your inputs are always small. Increase the array size(maybe 5000x5000?), numpy would outperform for-loop. Commented Sep 27, 2021 at 3:47
  • @Ch3steR Thanks. I am still surprise with my findings. I did recognise that there is cost to converting a python list to a numpy array. In my case, the conversion is done before the timing. hence my surprise to still find python for-loop and enumerate faster than Numpy splicing methods. Commented Sep 27, 2021 at 4:14
  • Have you changed change_cols too? I suspect increasing its size would show some interesting results. Commented Sep 27, 2021 at 4:19
  • @Ch3steR No as my real scenario presently uses 5 cols and 300 plus rows. I wonder if newer Python version has become more efficient. Commented Sep 27, 2021 at 4:26
  • with your 300x5 sample, i get splice=4.46 µs < enumerate_for_loop=5.16 µs < splice_arange=9.7 µs (python 3.9.2, numpy 1.20.2) Commented Sep 27, 2021 at 4:35

2 Answers 2

4

Here is one way to avoid the for loop.

import numpy as np

a = np.array([[ 1,  1,  1,  1,  1],
           [ 1,  1,  1,  1,  2],
           [ 1,  1,  1,  1,  3],
           [ 1,  1,  1, 18, 16],
           [ 1,  1,  1, 18, 17]], dtype=np.int16)
b = np.arange(0,50).reshape(5,10)
change_cols = [1,2,5,7,9]

b[:, change_cols] = a[:, ::-1]

Update with timing info

With some larger array sizes, it looks like this "splicing" approach may be faster than Sun Bear's for loop or U12-Forward's arange.

import numpy as np
import timeit

rng = np.random.default_rng(12345)
a = rng.random(size=(500, 100))
b = rng.integers(100, size=(500, 500))
change_cols = rng.choice(500, size=100)


def splice():
    b[:, change_cols] = a[:, ::-1]


def splice_arange():
    b[:, change_cols] = a[:,-(np.arange(len(change_cols)) + 1)]


def enumerate_for_loop():
    for n, i in enumerate(change_cols):
        b[:,i] = a[:,-(n+1)]

print("Splice")
print(timeit.timeit('splice()', number=10000, globals=globals()))
print("Splice arange")
print(timeit.timeit('splice_arange()', number=10000, globals=globals()))
print("Enumerate for loop")
print(timeit.timeit('enumerate_for_loop()', number=10000, globals=globals()))

Result:

Splice
1.2409849390387535
Splice arange
1.4882377870380878
Enumerate for loop
2.198731765151024
Sign up to request clarification or add additional context in comments.

3 Comments

Why is your solution (Numpy slicing) slower than python3 for-loop+enumerate methods? Did I compare correctly?
Thanks for showing the result on your system. I don't understand why my system is showing python for-loop as the fastest consistently. Confused. I am using Python 3.8.10. Your solution is faster than @u12-forward solution. Hardware?
I was running an online Python interpreter which has Python 3.8.2.
2

You could do it in one-line with np.arange for the index of the enumerate, and just pass the list of the assignment:

b[:, change_cols] = a[:,-(np.arange(len(change_cols)) + 1)]

And now:

print(b)

Gives:

[[ 0  1  1  3  4  1  6  1  8  1]
 [10  2  1 13 14  1 16  1 18  1]
 [20  3  1 23 24  1 26  1 28  1]
 [30 16 18 33 34  1 36  1 38  1]
 [40 17 18 43 44  1 46  1 48  1]]

3 Comments

Why is your solution (Numpy slicing+arange) slower than python3 for-loop+enumerate methods? Did I compare correctly?
@SunBear What is the result?
I updated them to my question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.