Creating a "bitmask" from several boolean numpy arrays

Question

I'm trying to convert several masks (boolean arrays) to a bitmask with numpy, while that in theory works I feel that I'm doing too many operations.

For example to create the bitmask I use:

import numpy as np

flags = [
    np.array([True, False, False]),
    np.array([False, True, False]),
    np.array([False, True, False])
]

flag_bits = np.zeros(3, dtype=np.int8)
for idx, flag in enumerate(flags):
    flag_bits += flag.astype(np.int8) << idx  # equivalent to flag * 2 ** idx

Which gives me the expected "bitmask":

>>> flag_bits 
array([1, 6, 0], dtype=int8)

>>> [np.binary_repr(bit, width=7) for bit in flag_bits]
['0000001', '0000110', '0000000']

However I feel that especially the casting to int8 and the addition with the flag_bits array is too complicated. Therefore I wanted to ask if there is any NumPy functionality that I missed that could be used to create such an "bitmask" array?

Note: I'm calling an external function that expects such a bitmask, otherwise I would stick with the boolean arrays.

Marat · Accepted Answer · 2017-02-05 22:45:16Z

2

>>> x = np.array(2**i for i in range(1, np.shape(flags)[1]+1))
>>> np.dot(flags, x)
array([1, 2, 2])

How it works: in a bit mask, every bit is effectively an original array element multiplied by a degree of 2 according to its position, e.g. 4 = False * 1 + True * 2 + False * 4. Effectively this can be represented as matrix multiplication, which is really efficient in numpy.

So, first line is a list comprehension to create these weights: x = [1, 2, 4, 8, ... 2^(n+1)].

Then, each line in flags is multiplied by the corresponding element in x and everything is summed up (this is how matrix multiplication works). At the end, we get the bitmask

edited Feb 5, 2017 at 22:45

answered Feb 5, 2017 at 22:42

Marat

15.9k3 gold badges44 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

MSeifert Over a year ago

ok, but why is the result wrong? I want [1, 6, 0] and this is giving me [1, 2, 4].

Marat Over a year ago

two reasons: I was using different values for testing and I'm summing up by lines while question goes by columns. To go by columns, just swap matrixes or transpose flags

DYZ Over a year ago

The flags have to be transposed (flags.T) before the dot product is calculated.

Marat Over a year ago

For large flags I think np.dot(x, flags) will be more efficient than flags.T

Divakar Over a year ago

@Marat Just use : np.dot(2**np.arange(3),flags), simple and sweet.

|

DYZ · Accepted Answer · 2017-02-05 23:43:21Z

2

How about this (added conversion to int8, if desired):

flag_bits = (np.transpose(flags) << np.arange(len(flags))).sum(axis=1)\
             .astype(np.int8)
#array([1, 6, 0], dtype=int8)

edited Feb 5, 2017 at 23:43

answered Feb 5, 2017 at 22:42

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

4 Comments

MSeifert Over a year ago

One suggestion: Instead of np.array and flags.T you could do it in one: (np.transpose(flags) << np.arange(len(flags))).sum(axis=1). However I can't seem to force the sum to return an int8 array. It always uses int32.

DYZ Over a year ago

Why would you care about about int8, if you convert it to a string, anyway? (But nonetheless I added the conversion to the answer).

MSeifert Over a year ago

The string-version was more meant as illustration what the bitmask is - not the expected result. I'm a bit concerned about memory, I'm using 6 shape (12000, 12000) boolean arrays and the intermediate np.transpose(flags)<<np.arange(len(flags)) array is a bit "big" with dtype int64 (even with int32).

MSeifert Over a year ago

But doesn't matter, your answer is working as expected!

Community · Accepted Answer · 2017-05-23 12:24:23Z

1

Here's an approach to directly get to the string bitmask with boolean-indexing -

out = np.repeat('0000000',3).astype('S7')
out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T

Sample run -

In [41]: flags
Out[41]: 
[array([ True, False, False], dtype=bool),
 array([False,  True, False], dtype=bool),
 array([False,  True, False], dtype=bool)]

In [42]: out = np.repeat('0000000',3).astype('S7')

In [43]: out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T

In [44]: out
Out[44]: 
array([b'0000001', b'0000110', b'0000000'], 
      dtype='|S7')

Using the same matrix-multiplication strategy as dicussed in detail in @Marat's solution, but using a vectorized scaling array that gives us flag_bits -

np.dot(2**np.arange(3),flags)

edited May 23, 2017 at 12:24

CommunityBot

11 silver badge

answered Feb 5, 2017 at 22:57

Divakar

222k19 gold badges273 silver badges374 bronze badges

3 Comments

MSeifert Over a year ago

Hehe, I actually wanted an integer array as result. The stringification was more to show what the bitmask represents (I didn't know if that term is common knowledge). :)

Divakar Over a year ago

@MSeifert Ah I see. Well I assumed that string as the bitmask :)

Eric Over a year ago

I suspect not using ** would be faster: np.dot(1<<np.arange(3, dtype=np.int8),flags)

Collectives™ on Stack Overflow

Creating a "bitmask" from several boolean numpy arrays

3 Answers 3

8 Comments

4 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related