How to change certain elements in a 2D numpy array at specified col positions?

Question

I have these info:

a = array([[ 1,  1,  1,  1,  1],
           [ 1,  1,  1,  1,  2],
           [ 1,  1,  1,  1,  3],
           [ 1,  1,  1, 18, 16],
           [ 1,  1,  1, 18, 17]], dtype=int16)
b = np.arange(0,50).reshape(5,10)
change_cols = [1,2,5,7,9]

I would like to change every row of b at columns defined by change_cols with the values of a[:,:,-1] to get:

b = array([[ 0,  1,  1,  3,  4,  1,  6,  1,  8,  1],
           [10,  2,  1, 13, 14,  1, 16,  1, 18,  1],
           [20,  3,  1, 23, 24,  1, 26,  1, 28,  1],
           [30, 16, 18, 33, 34,  1, 36,  1, 38,  1],
           [40, 17, 18, 43, 44,  1, 46,  1, 48,  1]])

Presently I am doing this:

for n, i in enumerate(change_cols):
    b[:,i] = a[:,-(n+1)]

How do I do this in NumPy efficiently w/o using Python's for-loop?

Update:

I tried to compare the time taken to complete the answers by @ogdencave and @U12-forward against python's for-loop_enumerate. Surprisingly, Python3 for-loop+enumerate is fastest. Why is this?

>>> def splice():
    b[:, change_cols] = a[:, ::-1]

    
>>> def splice_arange():
    b[:, change_cols] = a[:,-(np.arange(len(change_cols)) + 1)]

    
>>> def enumerate_for_loop():
    for n, i in enumerate(change_cols):
        b[:,i] = a[:,-(n+1)]

        
>>> timeit.timeit('splice()', number=10000, globals=globals())
0.042480306001380086
>>> timeit.timeit('splice_arange()', number=10000, globals=globals())
0.05964866199065
>>> timeit.timeit('enumerate_for_loop()', number=10000, globals=globals())
0.03969518095254898
>>>

I also tried array sizes close to my real scenario. I am surprise that python's for-loop+enumerate approach is the fastest.

>>> a = np.array([ 1,  1,  1, 18, 17]*300).reshape(300,5)
>>> b = np.arange(0,300*200).reshape(300,200)
>>> for i in range(5):
    timeit.timeit('splice()', number=10000, globals=globals())
0.04873670096276328
0.04880331002641469
0.055170061998069286
0.04291973798535764
0.031961234053596854

>>> for i in range(5):
    timeit.timeit('splice_arange()', number=10000, globals=globals())
0.07321989600313827
0.07536661700578406
0.06798515404807404
0.07559602102264762
0.07348818198079243

>>> for i in range(5):
    timeit.timeit('enumerate_for_loop()', number=10000, globals=globals())
0.054252722999081016
0.03883319004671648
0.036229485005605966
0.036062364000827074
0.03962253499776125

Testing with small inputs is not a very good metric for measuring performance unless your inputs are always small. Increase the array size(maybe 5000x5000?), numpy would outperform for-loop. — Ch3steR
– Ch3steR, Commented Sep 27, 2021 at 3:47
@Ch3steR Thanks. I am still surprise with my findings. I did recognise that there is cost to converting a python list to a numpy array. In my case, the conversion is done before the timing. hence my surprise to still find python for-loop and enumerate faster than Numpy splicing methods. — Sun Bear
– Sun Bear, Commented Sep 27, 2021 at 4:14
Have you changed change_cols too? I suspect increasing its size would show some interesting results. — Ch3steR
– Ch3steR, Commented Sep 27, 2021 at 4:19
@Ch3steR No as my real scenario presently uses 5 cols and 300 plus rows. I wonder if newer Python version has become more efficient. — Sun Bear
– Sun Bear, Commented Sep 27, 2021 at 4:26
with your 300x5 sample, i get splice=4.46 µs < enumerate_for_loop=5.16 µs < splice_arange=9.7 µs (python 3.9.2, numpy 1.20.2) — tdy
– tdy, Commented Sep 27, 2021 at 4:35

ogdenkev · Accepted Answer · 2021-09-27 04:15:14Z

4

Here is one way to avoid the for loop.

import numpy as np

a = np.array([[ 1,  1,  1,  1,  1],
           [ 1,  1,  1,  1,  2],
           [ 1,  1,  1,  1,  3],
           [ 1,  1,  1, 18, 16],
           [ 1,  1,  1, 18, 17]], dtype=np.int16)
b = np.arange(0,50).reshape(5,10)
change_cols = [1,2,5,7,9]

b[:, change_cols] = a[:, ::-1]

Update with timing info

With some larger array sizes, it looks like this "splicing" approach may be faster than Sun Bear's for loop or U12-Forward's arange.

import numpy as np
import timeit

rng = np.random.default_rng(12345)
a = rng.random(size=(500, 100))
b = rng.integers(100, size=(500, 500))
change_cols = rng.choice(500, size=100)


def splice():
    b[:, change_cols] = a[:, ::-1]


def splice_arange():
    b[:, change_cols] = a[:,-(np.arange(len(change_cols)) + 1)]


def enumerate_for_loop():
    for n, i in enumerate(change_cols):
        b[:,i] = a[:,-(n+1)]

print("Splice")
print(timeit.timeit('splice()', number=10000, globals=globals()))
print("Splice arange")
print(timeit.timeit('splice_arange()', number=10000, globals=globals()))
print("Enumerate for loop")
print(timeit.timeit('enumerate_for_loop()', number=10000, globals=globals()))

Result:

Splice
1.2409849390387535
Splice arange
1.4882377870380878
Enumerate for loop
2.198731765151024

edited Sep 27, 2021 at 4:15

answered Sep 27, 2021 at 2:50

ogdenkev

2,3941 gold badge13 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sun Bear Over a year ago

Why is your solution (Numpy slicing) slower than python3 for-loop+enumerate methods? Did I compare correctly?

Sun Bear Over a year ago

Thanks for showing the result on your system. I don't understand why my system is showing python for-loop as the fastest consistently. Confused. I am using Python 3.8.10. Your solution is faster than @u12-forward solution. Hardware?

ogdenkev Over a year ago

I was running an online Python interpreter which has Python 3.8.2.

U13-Forward · Accepted Answer · 2021-09-27 02:46:42Z

2

You could do it in one-line with np.arange for the index of the enumerate, and just pass the list of the assignment:

b[:, change_cols] = a[:,-(np.arange(len(change_cols)) + 1)]

And now:

print(b)

Gives:

[[ 0  1  1  3  4  1  6  1  8  1]
 [10  2  1 13 14  1 16  1 18  1]
 [20  3  1 23 24  1 26  1 28  1]
 [30 16 18 33 34  1 36  1 38  1]
 [40 17 18 43 44  1 46  1 48  1]]

answered Sep 27, 2021 at 2:46

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

3 Comments

Sun Bear Over a year ago

Why is your solution (Numpy slicing+arange) slower than python3 for-loop+enumerate methods? Did I compare correctly?

U13-Forward Over a year ago

@SunBear What is the result?

Sun Bear Over a year ago

I updated them to my question.

Collectives™ on Stack Overflow

How to change certain elements in a 2D numpy array at specified col positions?

2 Answers 2

Update with timing info

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Update with timing info

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related