6

I am performing a large number of these calculations:

A == A[np.newaxis].T

where A is a dense numpy array which frequently has common values.

For benchmarking purposes we can use:

n = 30000
A = np.random.randint(0, 1000, n)
A == A[np.newaxis].T

When I perform this calculation, I run into memory issues. I believe this is because the output isn't in more efficient bitarray or np.packedbits format. A secondary concern is we are performing twice as many comparisons as necessary, since the resulting Boolean array is symmetric.

The questions I have are:

  1. Is it possible to produce the Boolean numpy array output in a more memory efficient fashion without sacrificing speed? The options I know about are bitarray and np.packedbits, but I only know how to apply these after the large Boolean array is created.
  2. Can we utilise the symmetry of our calculation to halve the number of comparisons processed, again without sacrificing speed?

I will need to be able to perform & and | operations on Boolean arrays output. I have tried bitarray, which is super-fast for these bitwise operations. But it is slow to pack np.ndarray -> bitarray and then unpack bitarray -> np.ndarray.

[Edited to provide clarification.]

13
  • x == x[:, None] also seems to do what you want. Commented Jan 16, 2018 at 10:39
  • your output should be ridiculously sparse, so I'd say you want to save the indices of theTrue values rather than some homebrewed packbits implementation Commented Jan 16, 2018 at 10:50
  • You can argsort and compare consecutive elements. That is O(n log n) time and O(n) space. compared to O(n^2) and O(n^2) for the direct approach. Commented Jan 16, 2018 at 10:55
  • @DanielF, is it possible / efficient to perform Boolean operations on sparse matrices? I've tried using & and | operations, but these don't seem to be implemented. Commented Jan 16, 2018 at 11:00
  • You want to be able to do and and or operations on two 30k by 30k boolean matrices? Or do you want to broadcast them somehow? Commented Jan 16, 2018 at 11:06

4 Answers 4

4

Here's one with numba to give us a NumPy boolean array as output -

from numba import njit

@njit
def numba_app1(idx, n, s, out):
    for i,j in zip(idx[:-1],idx[1:]):
        s0 = s[i:j]
        c = 0
        for p1 in s0[c:]:
            for p2 in s0[c+1:]:
                out[p1,p2] = 1
                out[p2,p1] = 1
            c += 1
    return out

def app1(A):
    s = A.argsort()
    b = A[s]
    n = len(A)
    idx = np.flatnonzero(np.r_[True,b[1:] != b[:-1],True])
    out = np.zeros((n,n),dtype=bool)
    numba_app1(idx, n, s, out)
    out.ravel()[::out.shape[1]+1] = 1
    return out

Timings -

In [287]: np.random.seed(0)
     ...: n = 30000
     ...: A = np.random.randint(0, 1000, n)

# Original soln
In [288]: %timeit A == A[np.newaxis].T
1 loop, best of 3: 317 ms per loop

# @Daniel F's soln-1 that skips assigning lower diagonal in output
In [289]: %timeit sparse_outer_eq(A)
1 loop, best of 3: 450 ms per loop

# @Daniel F's soln-2 (complete one)
In [291]: %timeit sparse_outer_eq(A)
1 loop, best of 3: 634 ms per loop

# Solution from this post
In [292]: %timeit app1(A)
10 loops, best of 3: 66.9 ms per loop
Sign up to request clarification or add additional context in comments.

11 Comments

Huh, not sure why your timings are so different than mine. I do the same thing and get ~800ms for the broadcasted method and ~500ms for my sparse app.
@DanielF I am using your - Alternatively, you can define sparse_outer_eq as: code. Is that correct?
No, I'm using the first one.
@DanielF Well that first one sparse_outer_eq doesn't look complete since that skips the lower diag ones. Do you have a complete solution?
In the end, I liked this solution because the resulting array can utilise all of numpy's "nice" features, e.g. ~A. It would be great if sparse matrices were implemented generically enough so that default value can be non-zero.
|
2

This isn't even a numpy answer, but should work to keep your data requirements down by using a bit of homebrewed sparse notation

from numba import jit

@jit   # because this is gonna be loopy
def sparse_outer_eq(A):
    n = A.size
    c = []
    for i in range(n):
        for j in range(i + 1, n):
            if A[i] == A[j]:
                 c.append((i, j))
    return c

Now c is a list of coordinate tuples (i, j), i < j that correspond to coordinates in your boolean array that are "True". You can easily do and and or operations on these setwise:

list(set(c1) & set(c2))
list(set(c1) | set(c2))

Later, when you want to apply this mask to an array, you can back out the coordinates and use them for fancy indexing instead:

i_, j_ = list(np.array(c).T)
i = np.r_[i_, j_, np.arange(n)]
j = np.r_[j_, i_, np.arange(n)]

You can then np.lexsort i nd j if you care about order

Alternatively, you can define sparse_outer_eq as:

@jit
def sparse_outer_eq(A):
    n = A.size
    c = []
    for i in range(n):
        for j in range(n):
            if A[i] == A[j]:
                 c.append((i, j))
    return c

Which keeps >2x the data, but then the coordinates come out simply:

 i, j = list(np.array(c).T)

if you've done any set operations, this will still need to be lexsorted if you want a rational order.

If your coordinates are each n-bit integers, this should be more space-efficient than boolean format as long as your sparsity is less than 1/n -> 3% or so for 32-bit.

as for time, thanks to numba it's even faster than broadcasting:

n = 3000
A = np.random.randint(0, 1000, n)

%timeit sparse_outer_eq(A)
100 loops, best of 3: 4.86 ms per loop

%timeit A == A[:, None]
100 loops, best of 3: 11.8 ms per loop

and comparisons:

a = A == A[:, None]

b = B == B[:, None]

a_ = sparse_outer_eq(A)

b_ = sparse_outer_eq(B)

%timeit a & b
100 loops, best of 3: 5.9 ms per loop

%timeit list(set(a_) & set(b_))
1000 loops, best of 3: 641 µs per loop

%timeit a | b
100 loops, best of 3: 5.52 ms per loop

%timeit list(set(a_) | set(b_))
1000 loops, best of 3: 955 µs per loop

EDIT: if you want to do &~ (as per your comment) use the second sparse_outer_eq method (so you don't have to keep track of the diagonal) and just do:

list(set(a_) - set(b_))

3 Comments

this is useful but a couple of issues: (1) Performance very sensitive to n and # of unique values, e.g. I see your method 3x slower for n = 15,000 and 200 unique values. (2) Operations such as ~A are not as easy. I would probably have to store a full set of all coordinates [or cycle through them all] and exclude items in set(a_). But cf github.com/scipy/scipy/issues/1166.
Yeah, this method leans heavily on the fact that the original problem as stated is very sparse (1k unique values). If your input and output matrices are relatively dense, a boolean matrix will be more performant.
However I will point out that even at 1bit/boolean, this method is more memory efficient than a boolean matrix as long as the sparsity is less than 1/log2(n), i.e 32 unique values for n<4e9 and 16 unique values for n<65k. Of course, a non-operated ~ will blow any sort of sparsity up and make boolean more efficient.
2

Here is the more or less canonical argsort solution:

import numpy as np

def f_argsort(A):
    idx = np.argsort(A)
    As = A[idx]
    ne_ = np.r_[True, As[:-1] != As[1:], True]
    bnds = np.flatnonzero(ne_)
    valid = np.diff(bnds) != 1
    return [idx[bnds[i]:bnds[i+1]] for i in np.flatnonzero(valid)]

n = 30000
A = np.random.randint(0, 1000, n)
groups = f_argsort(A)

for grp in groups:
    print(len(grp), set(A[grp]), end=' ')
print()

2 Comments

This is good, but like DanielF's solution, it doesn't easily allow computation / storage of ~A. To be fair, I should have made this clear in my original query. However, your canonical argsort function has been used also in Divakar's solution, which is the one I'm going for here.
@jp_data_analysis No worries, go for what works best for you.
1

I'm adding a solution to my question because it satisfies these 3 properties:

  • Low, fixed, memory requirement
  • Fast bitwise operations (&, |, ~, etc)
  • Low storage, 1-bit per Boolean via packing integers

The downside is it is stored in np.packbits format. It is substantially slower than other methods (especially argsort), but if speed is not an issue the algorithm should work well. If anyone figures a way to optimise further, this would be very helpful.

Update: A more efficient version of the below algorithm can be found here: Improving performance on comparison algorithm np.packbits(A==A[:, None], axis=1).

import numpy as np
from numba import jit

@jit(nopython=True)
def bool2int(x):
    y = 0
    for i, j in enumerate(x):
        if j: y += int(j)<<(7-i)
    return y

@jit(nopython=True)
def compare_elementwise(arr, result, section):
    n = len(arr)

    for row in range(n):
        for col in range(n):

            section[col%8] = arr[row] == arr[col]

            if ((col + 1) % 8 == 0) or (col == (n-1)):
                result[row, col // 8] = bool2int(section)
                section[:] = 0

    return result

A = np.random.randint(0, 10, 100)
n = len(A)
result_arr = np.zeros((n, n // 8 if n % 8 == 0 else n // 8 + 1)).astype(np.uint8)
selection_arr = np.zeros(8).astype(np.uint8)

packed = compare_elementwise(A, result_arr, selection_arr)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.