1

Say I have a Numpy array of 500 lists with random sizes ranging from 0 to 9:

import numpy as np
a = np.array([[i for i in range(np.random.randint(10))] for _ in range(500)], dtype=object)

Now I want to append a value 100 to indices [0,10,20,30,40,50], I tried to apply a function to each list in the array:

func = np.vectorize(lambda x: x + [100])
a[[0,10,20,30,40,50]] = func(a[[0,10,20,30,40,50]])

but I get ValueError: setting an array element with a sequence.

Is there any way I can broadcast operations to all objects (with different sizes) in a Numpy array? In my case I usually have up to ~50,000 indices. Using a normal for loop would be too slow. I'm thinking maybe converting the array to a sparse matrix with equal sizes of rows if it's more efficient that way?

8
  • 2
    Numpy cannot compute efficiently arrays of lists. Lists are internally stored as pure-Python objects and such objects are inherently slow (compared to native ones). A list of list should be faster. Numpy arrays of objects are only possible for sake of convenience, not performance. Commented May 27 at 20:04
  • 3
    Also please note that np.vectorize "is essentially a for loop" (as stated in the doc). It does not vectorise anything (despite its confusing name). Commented May 27 at 20:06
  • Am I getting it right that you trying to do for i in indices: a[i] = a[i] + [100] with indices=[0, 10, 20, 30, 40, 50] without a loop ? Commented May 27 at 20:08
  • @Nevpzo yes, or for i in indices: a[i].append(100) Commented May 27 at 20:09
  • 3
    Awkward is probably better suited for such use-case. Numpy is for ND arrays, not jagged arrays. Alternatively, you can encode the list of list as a flatten list with a start-end array index (fast but tedious to use). Another alternative solution is to use Scipy's sparse matrices regarding your actual needs (not very efficient but certainly faster than Numpy arrays of lists). Commented May 27 at 20:15

1 Answer 1

2

Setting up your array (slightly smaller)

In [32]: a = np.array([[i for i in range(np.random.randint(10))] for _ in range(100)], dtype=object)

In [33]: idx = [0,10,30,50]

By specifying the otypes, I can run your vectorized function:

In [34]: func =lambda x: x + [100]; vfunc = np.vectorize(func, otypes=[object])

In [36]: vfunc(a[idx])
Out[36]: 
array([list([0, 1, 2, 3, 4, 5, 6, 100]),
       list([0, 1, 2, 3, 4, 5, 6, 7, 100]), list([0, 1, 100]),
       list([0, 1, 2, 3, 4, 100])], dtype=object)

In [37]: a[idx] = vfunc(a[idx])

In [38]: a[idx]
Out[38]: 
array([list([0, 1, 2, 3, 4, 5, 6, 100]),
       list([0, 1, 2, 3, 4, 5, 6, 7, 100]), list([0, 1, 100]),
       list([0, 1, 2, 3, 4, 100])], dtype=object)

The equivalent with iteration:

In [39]: for i in idx: a[i] = func(a[i])

In [40]: a[idx]
Out[40]: 
array([list([0, 1, 2, 3, 4, 5, 6, 100, 100]),
       list([0, 1, 2, 3, 4, 5, 6, 7, 100, 100]), list([0, 1, 100, 100]),
       list([0, 1, 2, 3, 4, 100, 100])], dtype=object)

I can't time the assignment without playing games with deep copies (I don't want to grow each list manytimes). But timing just the append step:

In [41]: %%timeit
    ...: vfunc(a[idx])
    ...: 
    ...: 
19.4 μs ± 459 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [42]: %%timeit
    ...: for i in idx: func(a[i])
    ...: 
    ...: 
2 μs ± 57.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

The loop is quite a bit faster.

Since I specify object otypes, I could just as well use frompyfunc, and run faster:

In [43]: vofunc = np.frompyfunc(func,1,1)

In [44]: vofunc(a[idx])
Out[44]: 
array([list([0, 1, 2, 3, 4, 5, 6, 100, 100, 100]),
       list([0, 1, 2, 3, 4, 5, 6, 7, 100, 100, 100]),
       list([0, 1, 100, 100, 100]), list([0, 1, 2, 3, 4, 100, 100, 100])],
      dtype=object)

In [45]: %%timeit
    ...: vofunc(a[idx])
    ...: 
    ...: 
9.34 μs ± 183 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Still iteration is faster.

In some other cases vectorize/frompyfunc is closer in speed to iteration, even a bit faster for large samples. But it never an order of magnitude faster.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.