Improving performance on comparison algorithm np.packbits(A==A[:, None], axis=1)

Question

I am looking to memory optimise np.packbits(A==A[:, None], axis=1), where A is dense array of integers of length n. A==A[:, None] is memory hungry for large n since the resulting Boolean array is stored inefficiently with each Boolean value costing 1 byte.

I wrote the below script to achieve the same result while packing bits one section at a time. It is, however, around 3x slower, so I am looking for ways to speed it up. Or, alternatively, a better algorithm with small memory overhead.

Note: this is a follow-up question to one I asked earlier; Comparing numpy array with itself by element efficiently.

Reproducible code below for benchmarking.

import numpy as np
from numba import jit

@jit(nopython=True)
def bool2int(x):
    y = 0
    for i, j in enumerate(x):
        if j: y += int(j)<<(7-i)
    return y

@jit(nopython=True)
def compare_elementwise(arr, result, section):
    n = len(arr)
    for row in range(n):
        for col in range(n):

            section[col%8] = arr[row] == arr[col]

            if ((col + 1) % 8 == 0) or (col == (n-1)):
                result[row, col // 8] = bool2int(section)
                section[:] = 0

    return result

n = 10000
A = np.random.randint(0, 1000, n)

result_arr = np.zeros((n, n // 8 if n % 8 == 0 else n // 8 + 1)).astype(np.uint8)
selection_arr = np.zeros(8).astype(np.uint8)

# memory efficient version, but slow
packed = compare_elementwise(A, result_arr, selection_arr)

# memory inefficient version, but fast
packed2 = np.packbits(A == A[:, None], axis=1)

assert (packed == packed2).all()

%timeit compare_elementwise(A, result_arr, selection_arr)  # 1.6 seconds
%timeit np.packbits(A == A[:, None], axis=1)  # 0.460 second

jpp · Accepted Answer · 2018-01-22 00:59:15Z

2

Here is a solution 3 times faster than the numpy one (a.size must be a multiple of 8; see below) :

@nb.njit
def comp(a):
    res=np.zeros((a.size,a.size//8),np.uint8)
    for i,x in enumerate(a):
        for j,y in enumerate(a):
            if x==y: res[i,j//8] |= 128 >> j%8 
    return res

This works because the array is scanned one time, where you do it many times, and amost all terms are null.

In [122]: %timeit np.packbits(A == A[:, None], axis=1)
389 ms ± 57.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [123]: %timeit comp(A)
123 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

If a.size%8 > 0, the cost for find back the information will be higher. The best way in this case is to pad the initial array with some (in range(7)) zeros.

For completeness, the padding could be done as so:

if A.size % 8 != 0: A = np.pad(A, (0, 8 - A.size % 8), 'constant', constant_values=0)

edited Jan 22, 2018 at 0:59

jpp

166k37 gold badges301 silver badges363 bronze badges

answered Jan 21, 2018 at 18:25

B. M.

18.7k2 gold badges40 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Improving performance on comparison algorithm np.packbits(A==A[:, None], axis=1)

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related