Comparing numpy array with itself by element efficiently

Question

I am performing a large number of these calculations:

A == A[np.newaxis].T

where A is a dense numpy array which frequently has common values.

For benchmarking purposes we can use:

n = 30000
A = np.random.randint(0, 1000, n)
A == A[np.newaxis].T

When I perform this calculation, I run into memory issues. I believe this is because the output isn't in more efficient bitarray or np.packedbits format. A secondary concern is we are performing twice as many comparisons as necessary, since the resulting Boolean array is symmetric.

The questions I have are:

Is it possible to produce the Boolean numpy array output in a more memory efficient fashion without sacrificing speed? The options I know about are bitarray and np.packedbits, but I only know how to apply these after the large Boolean array is created.
Can we utilise the symmetry of our calculation to halve the number of comparisons processed, again without sacrificing speed?

I will need to be able to perform & and | operations on Boolean arrays output. I have tried bitarray, which is super-fast for these bitwise operations. But it is slow to pack np.ndarray -> bitarray and then unpack bitarray -> np.ndarray.

[Edited to provide clarification.]

your output should be ridiculously sparse, so I'd say you want to save the indices of theTrue values rather than some homebrewed packbits implementation — Daniel F
– Daniel F, Commented Jan 16, 2018 at 10:50
You can argsort and compare consecutive elements. That is O(n log n) time and O(n) space. compared to O(n^2) and O(n^2) for the direct approach. — Paul Panzer
– Paul Panzer, Commented Jan 16, 2018 at 10:55
@DanielF, is it possible / efficient to perform Boolean operations on sparse matrices? I've tried using & and | operations, but these don't seem to be implemented. — jpp
– jpp, Commented Jan 16, 2018 at 11:00
You want to be able to do and and or operations on two 30k by 30k boolean matrices? Or do you want to broadcast them somehow? — Daniel F
– Daniel F, Commented Jan 16, 2018 at 11:06

Divakar · Accepted Answer · 2018-01-16 14:43:27Z

4

Here's one with numba to give us a NumPy boolean array as output -

from numba import njit

@njit
def numba_app1(idx, n, s, out):
    for i,j in zip(idx[:-1],idx[1:]):
        s0 = s[i:j]
        c = 0
        for p1 in s0[c:]:
            for p2 in s0[c+1:]:
                out[p1,p2] = 1
                out[p2,p1] = 1
            c += 1
    return out

def app1(A):
    s = A.argsort()
    b = A[s]
    n = len(A)
    idx = np.flatnonzero(np.r_[True,b[1:] != b[:-1],True])
    out = np.zeros((n,n),dtype=bool)
    numba_app1(idx, n, s, out)
    out.ravel()[::out.shape[1]+1] = 1
    return out

Timings -

In [287]: np.random.seed(0)
     ...: n = 30000
     ...: A = np.random.randint(0, 1000, n)

# Original soln
In [288]: %timeit A == A[np.newaxis].T
1 loop, best of 3: 317 ms per loop

# @Daniel F's soln-1 that skips assigning lower diagonal in output
In [289]: %timeit sparse_outer_eq(A)
1 loop, best of 3: 450 ms per loop

# @Daniel F's soln-2 (complete one)
In [291]: %timeit sparse_outer_eq(A)
1 loop, best of 3: 634 ms per loop

# Solution from this post
In [292]: %timeit app1(A)
10 loops, best of 3: 66.9 ms per loop

edited Jan 16, 2018 at 14:43

answered Jan 16, 2018 at 13:53

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Daniel F Over a year ago

Huh, not sure why your timings are so different than mine. I do the same thing and get ~800ms for the broadcasted method and ~500ms for my sparse app.

Divakar Over a year ago

@DanielF I am using your - Alternatively, you can define sparse_outer_eq as: code. Is that correct?

Daniel F Over a year ago

No, I'm using the first one.

Divakar Over a year ago

@DanielF Well that first one sparse_outer_eq doesn't look complete since that skips the lower diag ones. Do you have a complete solution?

jpp Over a year ago

In the end, I liked this solution because the resulting array can utilise all of numpy's "nice" features, e.g. ~A. It would be great if sparse matrices were implemented generically enough so that default value can be non-zero.

|

Daniel F · Accepted Answer · 2018-01-16 14:17:21Z

2

This isn't even a numpy answer, but should work to keep your data requirements down by using a bit of homebrewed sparse notation

from numba import jit

@jit   # because this is gonna be loopy
def sparse_outer_eq(A):
    n = A.size
    c = []
    for i in range(n):
        for j in range(i + 1, n):
            if A[i] == A[j]:
                 c.append((i, j))
    return c

Now c is a list of coordinate tuples (i, j), i < j that correspond to coordinates in your boolean array that are "True". You can easily do and and or operations on these setwise:

list(set(c1) & set(c2))
list(set(c1) | set(c2))

Later, when you want to apply this mask to an array, you can back out the coordinates and use them for fancy indexing instead:

i_, j_ = list(np.array(c).T)
i = np.r_[i_, j_, np.arange(n)]
j = np.r_[j_, i_, np.arange(n)]

You can then np.lexsort i nd j if you care about order

Alternatively, you can define sparse_outer_eq as:

@jit
def sparse_outer_eq(A):
    n = A.size
    c = []
    for i in range(n):
        for j in range(n):
            if A[i] == A[j]:
                 c.append((i, j))
    return c

Which keeps >2x the data, but then the coordinates come out simply:

 i, j = list(np.array(c).T)

if you've done any set operations, this will still need to be lexsorted if you want a rational order.

If your coordinates are each n-bit integers, this should be more space-efficient than boolean format as long as your sparsity is less than 1/n -> 3% or so for 32-bit.

as for time, thanks to numba it's even faster than broadcasting:

n = 3000
A = np.random.randint(0, 1000, n)

%timeit sparse_outer_eq(A)
100 loops, best of 3: 4.86 ms per loop

%timeit A == A[:, None]
100 loops, best of 3: 11.8 ms per loop

and comparisons:

a = A == A[:, None]

b = B == B[:, None]

a_ = sparse_outer_eq(A)

b_ = sparse_outer_eq(B)

%timeit a & b
100 loops, best of 3: 5.9 ms per loop

%timeit list(set(a_) & set(b_))
1000 loops, best of 3: 641 µs per loop

%timeit a | b
100 loops, best of 3: 5.52 ms per loop

%timeit list(set(a_) | set(b_))
1000 loops, best of 3: 955 µs per loop

EDIT: if you want to do &~ (as per your comment) use the second sparse_outer_eq method (so you don't have to keep track of the diagonal) and just do:

list(set(a_) - set(b_))

edited Jan 16, 2018 at 14:17

answered Jan 16, 2018 at 11:29

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

3 Comments

jpp Over a year ago

this is useful but a couple of issues: (1) Performance very sensitive to n and # of unique values, e.g. I see your method 3x slower for n = 15,000 and 200 unique values. (2) Operations such as ~A are not as easy. I would probably have to store a full set of all coordinates [or cycle through them all] and exclude items in set(a_). But cf github.com/scipy/scipy/issues/1166.

Daniel F Over a year ago

Yeah, this method leans heavily on the fact that the original problem as stated is very sparse (1k unique values). If your input and output matrices are relatively dense, a boolean matrix will be more performant.

Daniel F Over a year ago

However I will point out that even at 1bit/boolean, this method is more memory efficient than a boolean matrix as long as the sparsity is less than 1/log2(n), i.e 32 unique values for n<4e9 and 16 unique values for n<65k. Of course, a non-operated ~ will blow any sort of sparsity up and make boolean more efficient.

Paul Panzer · Accepted Answer · 2018-01-16 20:48:07Z

2

Here is the more or less canonical argsort solution:

import numpy as np

def f_argsort(A):
    idx = np.argsort(A)
    As = A[idx]
    ne_ = np.r_[True, As[:-1] != As[1:], True]
    bnds = np.flatnonzero(ne_)
    valid = np.diff(bnds) != 1
    return [idx[bnds[i]:bnds[i+1]] for i in np.flatnonzero(valid)]

n = 30000
A = np.random.randint(0, 1000, n)
groups = f_argsort(A)

for grp in groups:
    print(len(grp), set(A[grp]), end=' ')
print()

edited Jan 16, 2018 at 20:48

answered Jan 16, 2018 at 11:42

Paul Panzer

53.3k3 gold badges60 silver badges103 bronze badges

2 Comments

jpp Over a year ago

This is good, but like DanielF's solution, it doesn't easily allow computation / storage of ~A. To be fair, I should have made this clear in my original query. However, your canonical argsort function has been used also in Divakar's solution, which is the one I'm going for here.

Paul Panzer Over a year ago

@jp_data_analysis No worries, go for what works best for you.

jpp · Accepted Answer · 2018-01-22 01:08:21Z

I'm adding a solution to my question because it satisfies these 3 properties:

Low, fixed, memory requirement
Fast bitwise operations (&, |, ~, etc)
Low storage, 1-bit per Boolean via packing integers

The downside is it is stored in np.packbits format. It is substantially slower than other methods (especially argsort), but if speed is not an issue the algorithm should work well. If anyone figures a way to optimise further, this would be very helpful.

Update: A more efficient version of the below algorithm can be found here: Improving performance on comparison algorithm np.packbits(A==A[:, None], axis=1).

import numpy as np
from numba import jit

@jit(nopython=True)
def bool2int(x):
    y = 0
    for i, j in enumerate(x):
        if j: y += int(j)<<(7-i)
    return y

@jit(nopython=True)
def compare_elementwise(arr, result, section):
    n = len(arr)

    for row in range(n):
        for col in range(n):

            section[col%8] = arr[row] == arr[col]

            if ((col + 1) % 8 == 0) or (col == (n-1)):
                result[row, col // 8] = bool2int(section)
                section[:] = 0

    return result

A = np.random.randint(0, 10, 100)
n = len(A)
result_arr = np.zeros((n, n // 8 if n % 8 == 0 else n // 8 + 1)).astype(np.uint8)
selection_arr = np.zeros(8).astype(np.uint8)

packed = compare_elementwise(A, result_arr, selection_arr)

Collectives™ on Stack Overflow

Comparing numpy array with itself by element efficiently

4 Answers 4

11 Comments

3 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

11 Comments

3 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related