Algorithm to convert a 2d binary list to decimal numbers in Python

Question

I ran into this performance issue when I tried to convert a huge 2D list of binary data to decimal numbers.

Given a list:

biglist = [
  [[1,0],[0,0],[1,1]],
  [[0,0],[1,1],[1,0]],
  #...
  #easily go to thousands of rows
]

In each row, I want combine all first element of each column and convert it into a decimal number:

E.g.

In row 0, I want int('101',2) i.e. 5

In row 1, I want int('011',2) i.e. 3

My final goal is to create a dictionary that counts what integer appears how many times. Considering the given data in the example above, the final result should be a dictionary with {key:value} pair like {a_int : appearance_count} like this:

{{5:1},{3:1}}

Now my solution is this:

result = {}
for k in biglist:
    num = int("".join(str(row[0]) for row in k), 2)
    #count
    if num not in result:
        result[num] = 1
    else:
        result[num] += 1

This loop is slow for a list of thousands of rows, is there a better solution?

I guess if your code is working fine but you want to enhance it, you should post this in Code Review Stack Exchange — RoyaumeIX
– RoyaumeIX, Commented Oct 15, 2017 at 16:21
Are each of your "rows" the same amount of columns? ie - you're only dealing with 3 bits here... or could they be much larger? — Jon Clements
– Jon Clements, Commented Oct 15, 2017 at 16:46
@JonClements Yes their widths are all the same and they can be any positive number of bits. — tic30
– tic30, Commented Oct 15, 2017 at 22:10

MBo · Accepted Answer · 2017-10-15 16:47:40Z

1

Just collect bits of integer value instead of string-int transformations: (pseudocode)

for every row:
    value = 0
    for every col:
        value = (value << 1) | biglist[row][col][0]  # bitwise shift left and OR

   #equivalent  operation:
        value = value * 2 +  biglist[row][col][0]

edited Oct 15, 2017 at 16:47

answered Oct 15, 2017 at 16:41

MBo

80.7k5 gold badges61 silver badges99 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tic30 Over a year ago

Thank you. This has improved the performance compared to using string join and has improved it significant enough for me to meet the performance requirement without using extra library.

Jean-François Fabre · Accepted Answer · 2017-10-15 18:33:14Z

1

First, using string to int conversion with join is convenient, but slow. Compute the value from the powers of 2 classicaly, using sum, enumerate and bit shift on the ones (skip the zeroes)

Second, you should use collections.Counter for this

In one line:

result = collections.Counter(sum(v[0]<<(len(k)-i-1) for i,v in enumerate(k) if v[0]) for k in biglist)

this code runs 30% faster as your original code on my machine.

edited Oct 15, 2017 at 18:33

answered Oct 15, 2017 at 16:27

Jean-François Fabre♦

141k24 gold badges179 silver badges246 bronze badges

1 Comment

tic30 Over a year ago

Thank you. It is indeed a good idea to use collections.Counter to count. The performance improvement is remarkable.

B. M. · Accepted Answer · 2017-10-15 18:45:33Z

If you need performance, you should use numpy or numba, which all low level routines are done at nearly C speed :

import numpy as np
bigarray=np.random.randint(0,2,10**4*3*2).reshape(10**4,3,2)
biglist=[[[e for e in B] for B in A] for A in bigarray]
# [[[1, 0], [0, 0], [1, 0]],
#  [[0, 0], [0, 1], [0, 1]], 
#  [[1, 0], [1, 0], [0, 0]], ...

def your_count(biglist):
    integers=[]
    for k in biglist:
        num = int("".join(str(row[0]) for row in k), 2)
        integers.append(num)
    return integers

def count_python(big):
    m=len(big)
    integers=np.empty(m,np.int32)
    for i in range(m):
        n=len(big[i])
        b=1
        s=0
        for j in range(n-1,-1,-1):
               s = s+big[i][j][0]*b
               b=b*2
        integers[i]=s
    return integers

def count_numpy(bigarray): 
integers=(bigarray[:,:,0]*[4,2,1]).sum(axis=1)
return integers

from numba import njit    
count_numba =njit(count_python)

And some tests:

In [125]: %timeit your_count(biglist)
145 ms ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [126]: %timeit count_python(biglist)
29.6 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [127]: %timeit count_numpy(bigarray)
354 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [128]: %timeit count_numba(bigarray)
73 µs ± 938 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Numba allow you to compile a low level version of some python codes (not yours because Numba don't manage strings and list, only numpy arrays). Numpy give you special syntax to make fantastic things in one instruction, for good performances.

The Numba solution is here 2000x faster than yours.

The counts are efficiently computed by collections.Counter or np.unique :

In [150]: %timeit {k:v for k,v in zip(*np.unique(integers,return_counts=True))} 
46.4 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [151]: %timeit Counter(integers)
218 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Collectives™ on Stack Overflow

Algorithm to convert a 2d binary list to decimal numbers in Python

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related