numba error with tuple sorting containing numpy arrays

Question

I have a (working) function that uses the heapq module to build a priority queue of tuples and I would like to compile it with numba, however I get a very long and unclear error. It seems to boil down to a problem with tuple order comparison needed for the queue. The tuples have a fixed format, where the first item is a a floating point number (whose order I care about) and then a numpy array, which I need for computation but never get compared when running normally. This is intended because comparison on numpy arrays yields an array which cannot be used in conditionals and raises an exception. However, I guess numba needs a scalar yielding comparison to be at least defined for all items in the tuple, and hence the numba error.

I have a very minimal example:

@numba.njit
def f():
    return 1 if (1, numpy.arange(3)) < (2, numpy.arange(3)) else 2
f()

where the numba compilation fails (without numba it works since it never needs to actually compare the arrays, as in the original code).

Here is a slightly less minimal but maybe clearer example, which shows what I am actually doing:

from heapq import heappush
import numpy
import numba
@numba.njit
def f(n):
  heap = [(1, 0, numpy.random.rand(2, 3))]
  for unique_id in range(n):
    order = numpy.random.rand()
    data = numpy.random.rand(2, 3)
    heappush(heap, (order, unique_id, data))
  return heap[0]
f(100)

Here order is the variable whose order I care about in the queue, unique_id is a trick to avoid that when order is the same the comparison goes on to data and throws an exception.

I tried to bypass the problem converting the numpy array to a list when in the tuple and back to array for computation, but while this compiles, the numba version is slower than the interpreted one, even thought the array is quite small (usually 2x3). Without converting I would need to rewrite the code as loops which I would prefer to avoid (but is doable).

Is there a better alternative to get this working with numba, hopefully running faster than the python interpreter?

The point is that Numba needs to generate the code comparing the whole tuple. This means all case should be considered while the Python interpreter can try to execute the comparison and crash if it does not make sense. The code actually fail with the interpreter if the first item of each tuples are the same (The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), ie. it does not make sense to compare arrays the way you do). Using this as a feature is clearly not a good idea. — Jérôme Richard
– Jérôme Richard, Commented Mar 29, 2022 at 11:28
Yes, you are right, I actually had to put a unique integer before the array to avoid ever comparing the arrays, which is a bit of a hack but at least everything works correctly in normal python. — Giovanni Birolo
– Giovanni Birolo, Commented Mar 29, 2022 at 14:05

rveronese · Accepted Answer · 2022-04-05 09:07:54Z

1

I'll try to respond based on the minimal example you provided.

I think that the problem here is not related to the ability of numba to perform comparison between all the elements of the tuple, but rather on where to store the result of such a comparison. This is stated in the error log returned when trying to execute your example:

cannot store {i8*, i8*, i64, i64, i8*, [1 x i64], [1 x i64]} to i1*: mismatching types

Basically, you are trying to store the result of a comparison between a pair of floats and a pair of arrays into a single boolean, and numba doesn't know how to do that.

If you are only interested in comparing the first elements of the tuples, the quickest workaround I can think of is forcing the comparison to happen only on the first elements, e.g.

@numba.njit
def f():
    return 1 if (1, numpy.arange(3))[0] < (2, numpy.arange(3))[0] else 2
f()

If this is not applicable to your use case, please provide more details about it.

EDIT

According to the further information you provided, I think the best way to solve this is avoiding pushing the numpy arrays to the heap. Since you're only interested in the ordering properties of the heap, you can just push the keys to the heap and store the corresponding numpy arrays in a separate dictionary, using as keys the same values you push in the heap.

As a sidenote, when you use standard library functions in nopython-jitted functions, you are resorting on specific numba's re-implementation of those functions rather then the "original" python's ones. A comprehensive list of the available python's features in numba can be found here.

edited Apr 5, 2022 at 9:07

answered Mar 29, 2022 at 10:13

rveronese

1314 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Giovanni Birolo Over a year ago

Yes, I agree, the problem is that array comparison returns a boolean array and not a single value as needed here. Your solution works for the minimal example (which is maybe too minimal), but not when using the tuple in heapq since the comparison in inside the library code and I cannot slice the tuple like you did. Sorry for not being more clear, I will update the orginal post.

rveronese Over a year ago

@GiovanniBirolo I've seen the edit, I will extend my answer accordingly but I can anticipate you that things are not as straightforward as in your example. Just a question to better understand your use case (and possibly provide you with a solution): is there any particular reason why you want to push the data to the heap? I guess the main reason why you are using a heap is its ordering and you just want to associate heap keys to 2d numpy array, so that you can access to the data in the order defined by the keys. Am I right?

Giovanni Birolo Over a year ago

Yes, it is as you said. I use the heap to explore a tree of partial probability computations, the ordering is needed to compute the highest probability branches first. At each stage I extend the best partial computation using the data in the array up to a fixed branch length when I get a full solution. Actually, reading your comment, I found a solution (from when you wrote "associate keys to 2d numpy array", I posted it in a separate answer. If you have a better idea please let me know. If this was your idea and you want to post it I will accept yours!

rveronese Over a year ago

@GiovanniBirolo this was exactly what I wanted to suggest you. I'll update my answer adding also some further information that can be useful for others coming here. Bests.

Giovanni Birolo · Accepted Answer · 2022-04-04 14:58:43Z

Ok, I found a solution to the problem: since storing the array in the heap tuple is the cause of the numba error, it is enough to store it in a separate dictionary with an unique key and store only the key in the heap tuple. For instance, using an integer as the key:

from heapq import heappush
import numpy
import numba
@numba.njit
def f(n):
  key = 0
  array_storage = {key: numpy.random.rand(2, 3)}
  heap = [(1.0, key)]
  for _ in range(n):
    order = numpy.random.rand()
    data = numpy.random.rand(2, 3)
    key += 1
    heappush(heap, (order, key))
    array_storage[key] = data
  return heap[0]
f(100)

Now the tuples in the heap can be compared yielding a boolean value and I still get to associate the data with its tuple. I am not completely satisfied since it seems a workaround, but it works pretty well and it is not overly complicated. If anyone has a better one please let me know!

Collectives™ on Stack Overflow

numba error with tuple sorting containing numpy arrays

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related