How to broadcast operation to Numpy array of objects?

Question

Say I have a Numpy array of 500 lists with random sizes ranging from 0 to 9:

import numpy as np
a = np.array([[i for i in range(np.random.randint(10))] for _ in range(500)], dtype=object)

Now I want to append a value 100 to indices [0,10,20,30,40,50], I tried to apply a function to each list in the array:

func = np.vectorize(lambda x: x + [100])
a[[0,10,20,30,40,50]] = func(a[[0,10,20,30,40,50]])

but I get ValueError: setting an array element with a sequence.

Is there any way I can broadcast operations to all objects (with different sizes) in a Numpy array? In my case I usually have up to ~50,000 indices. Using a normal for loop would be too slow. I'm thinking maybe converting the array to a sparse matrix with equal sizes of rows if it's more efficient that way?

Numpy cannot compute efficiently arrays of lists. Lists are internally stored as pure-Python objects and such objects are inherently slow (compared to native ones). A list of list should be faster. Numpy arrays of objects are only possible for sake of convenience, not performance. — Jérôme Richard
– Jérôme Richard, Commented May 27 at 20:04
Also please note that np.vectorize "is essentially a for loop" (as stated in the doc). It does not vectorise anything (despite its confusing name). — Jérôme Richard
– Jérôme Richard, Commented May 27 at 20:06
Am I getting it right that you trying to do for i in indices: a[i] = a[i] + [100] with indices=[0, 10, 20, 30, 40, 50] without a loop ? — Nevpzo
– Nevpzo, Commented May 27 at 20:08
Awkward is probably better suited for such use-case. Numpy is for ND arrays, not jagged arrays. Alternatively, you can encode the list of list as a flatten list with a start-end array index (fast but tedious to use). Another alternative solution is to use Scipy's sparse matrices regarding your actual needs (not very efficient but certainly faster than Numpy arrays of lists). — Jérôme Richard
– Jérôme Richard, Commented May 27 at 20:15

hpaulj · Accepted Answer · 2025-05-27 20:58:30Z

Setting up your array (slightly smaller)

In [32]: a = np.array([[i for i in range(np.random.randint(10))] for _ in range(100)], dtype=object)

In [33]: idx = [0,10,30,50]

By specifying the otypes, I can run your vectorized function:

In [34]: func =lambda x: x + [100]; vfunc = np.vectorize(func, otypes=[object])

In [36]: vfunc(a[idx])
Out[36]: 
array([list([0, 1, 2, 3, 4, 5, 6, 100]),
       list([0, 1, 2, 3, 4, 5, 6, 7, 100]), list([0, 1, 100]),
       list([0, 1, 2, 3, 4, 100])], dtype=object)

In [37]: a[idx] = vfunc(a[idx])

In [38]: a[idx]
Out[38]: 
array([list([0, 1, 2, 3, 4, 5, 6, 100]),
       list([0, 1, 2, 3, 4, 5, 6, 7, 100]), list([0, 1, 100]),
       list([0, 1, 2, 3, 4, 100])], dtype=object)

The equivalent with iteration:

In [39]: for i in idx: a[i] = func(a[i])

In [40]: a[idx]
Out[40]: 
array([list([0, 1, 2, 3, 4, 5, 6, 100, 100]),
       list([0, 1, 2, 3, 4, 5, 6, 7, 100, 100]), list([0, 1, 100, 100]),
       list([0, 1, 2, 3, 4, 100, 100])], dtype=object)

I can't time the assignment without playing games with deep copies (I don't want to grow each list manytimes). But timing just the append step:

In [41]: %%timeit
    ...: vfunc(a[idx])
    ...: 
    ...: 
19.4 μs ± 459 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [42]: %%timeit
    ...: for i in idx: func(a[i])
    ...: 
    ...: 
2 μs ± 57.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

The loop is quite a bit faster.

Since I specify object otypes, I could just as well use frompyfunc, and run faster:

In [43]: vofunc = np.frompyfunc(func,1,1)

In [44]: vofunc(a[idx])
Out[44]: 
array([list([0, 1, 2, 3, 4, 5, 6, 100, 100, 100]),
       list([0, 1, 2, 3, 4, 5, 6, 7, 100, 100, 100]),
       list([0, 1, 100, 100, 100]), list([0, 1, 2, 3, 4, 100, 100, 100])],
      dtype=object)

In [45]: %%timeit
    ...: vofunc(a[idx])
    ...: 
    ...: 
9.34 μs ± 183 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Still iteration is faster.

In some other cases vectorize/frompyfunc is closer in speed to iteration, even a bit faster for large samples. But it never an order of magnitude faster.

Collectives™ on Stack Overflow

How to broadcast operation to Numpy array of objects?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related