6

I'm trying to convert several masks (boolean arrays) to a bitmask with numpy, while that in theory works I feel that I'm doing too many operations.

For example to create the bitmask I use:

import numpy as np

flags = [
    np.array([True, False, False]),
    np.array([False, True, False]),
    np.array([False, True, False])
]

flag_bits = np.zeros(3, dtype=np.int8)
for idx, flag in enumerate(flags):
    flag_bits += flag.astype(np.int8) << idx  # equivalent to flag * 2 ** idx

Which gives me the expected "bitmask":

>>> flag_bits 
array([1, 6, 0], dtype=int8)

>>> [np.binary_repr(bit, width=7) for bit in flag_bits]
['0000001', '0000110', '0000000']

However I feel that especially the casting to int8 and the addition with the flag_bits array is too complicated. Therefore I wanted to ask if there is any NumPy functionality that I missed that could be used to create such an "bitmask" array?

Note: I'm calling an external function that expects such a bitmask, otherwise I would stick with the boolean arrays.

0

3 Answers 3

2
>>> x = np.array(2**i for i in range(1, np.shape(flags)[1]+1))
>>> np.dot(flags, x)
array([1, 2, 2])

How it works: in a bit mask, every bit is effectively an original array element multiplied by a degree of 2 according to its position, e.g. 4 = False * 1 + True * 2 + False * 4. Effectively this can be represented as matrix multiplication, which is really efficient in numpy.

So, first line is a list comprehension to create these weights: x = [1, 2, 4, 8, ... 2^(n+1)].

Then, each line in flags is multiplied by the corresponding element in x and everything is summed up (this is how matrix multiplication works). At the end, we get the bitmask

Sign up to request clarification or add additional context in comments.

8 Comments

ok, but why is the result wrong? I want [1, 6, 0] and this is giving me [1, 2, 4].
two reasons: I was using different values for testing and I'm summing up by lines while question goes by columns. To go by columns, just swap matrixes or transpose flags
The flags have to be transposed (flags.T) before the dot product is calculated.
For large flags I think np.dot(x, flags) will be more efficient than flags.T
@Marat Just use : np.dot(2**np.arange(3),flags), simple and sweet.
|
2

How about this (added conversion to int8, if desired):

flag_bits = (np.transpose(flags) << np.arange(len(flags))).sum(axis=1)\
             .astype(np.int8)
#array([1, 6, 0], dtype=int8)

4 Comments

One suggestion: Instead of np.array and flags.T you could do it in one: (np.transpose(flags) << np.arange(len(flags))).sum(axis=1). However I can't seem to force the sum to return an int8 array. It always uses int32.
Why would you care about about int8, if you convert it to a string, anyway? (But nonetheless I added the conversion to the answer).
The string-version was more meant as illustration what the bitmask is - not the expected result. I'm a bit concerned about memory, I'm using 6 shape (12000, 12000) boolean arrays and the intermediate np.transpose(flags)<<np.arange(len(flags)) array is a bit "big" with dtype int64 (even with int32).
But doesn't matter, your answer is working as expected!
1

Here's an approach to directly get to the string bitmask with boolean-indexing -

out = np.repeat('0000000',3).astype('S7')
out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T

Sample run -

In [41]: flags
Out[41]: 
[array([ True, False, False], dtype=bool),
 array([False,  True, False], dtype=bool),
 array([False,  True, False], dtype=bool)]

In [42]: out = np.repeat('0000000',3).astype('S7')

In [43]: out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T

In [44]: out
Out[44]: 
array([b'0000001', b'0000110', b'0000000'], 
      dtype='|S7')

Using the same matrix-multiplication strategy as dicussed in detail in @Marat's solution, but using a vectorized scaling array that gives us flag_bits -

np.dot(2**np.arange(3),flags)

3 Comments

Hehe, I actually wanted an integer array as result. The stringification was more to show what the bitmask represents (I didn't know if that term is common knowledge). :)
@MSeifert Ah I see. Well I assumed that string as the bitmask :)
I suspect not using ** would be faster: np.dot(1<<np.arange(3, dtype=np.int8),flags)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.