Fastest way to get a bit-wise median byte?

Question

Given that I have a large set of n bytes, what is the fastest way to generate a byte which is the "average" or perhaps the "bit-wise median" of that set?

More specifically, I want to have a resulting byte where each bit is set if a majority of the bytes have that bit set.

Example (with nibbles)

Bytes: 
1000
1111
1010

Result: 
1010

I'm sure there's some bit twiddling magic that will help me do this, but I haven't been able to find it so far. My only ideas so far have been the naive approach. Any ideas?

EDIT: In the case that the number of occurrences for a bit are equal the bit can be set arbitrarily.

EDIT 2: I used bytes here for the sake of the example. Preferably the suggested method should hold for byte vectors of up to ~128 bytes.

@Austin: his definition of "average" is explained after the question mark — LeleDumbo
– LeleDumbo, Commented Mar 17, 2015 at 14:30
Correct. Perhaps another term would be better? "bit-wise median"? — Poyan
– Poyan, Commented Mar 17, 2015 at 14:31
Ah missed that. Yeah I think median might be a more appropiate word — user2033671
– user2033671, Commented Mar 17, 2015 at 14:32

user555045 · Accepted Answer · 2015-03-17 17:32:20Z

There is a bit-twiddling way, but it's not that pretty (nor particularly fast). But I'll explain it anyway, since you asked, and it's interesting.

The idea is that instead of holding the count for each bit position in an integer, hold the bits of the counters in integers. So if you view the counters as a boolean matrix with each row being a counter, store the columns in integers. This way, putting in a numbers is a weird kind of addition, instead of "as many increments as there are bits". Like this (not tested) (adding in c)

for (int i = 0; i < counter_bits; i++) {
    counter[i] ^= c;
    c &= ~counter[i]; // counter & c == ~(counter ^ c) & c
}

And now the fun part: what is the median? Well, if n (the number of items) is a power of two minus one and counter is an array that is exactly long enough to express n "vertically", then the median is exactly the value of the "upper bit" counter (there a 1 will appear whenever a bit has been seen at least (n+1)/2 times).

The more interesting case is "otherwise". It's still fixable, all that's necessary to make it work out nicely again is to initialize the counters with a bias, such that the highest bit becomes set exactly at the right moment. For example, if n = 5, then the counters should be initialized to 1 (vertically 1, so counter[0] = -1, all other counters are 0), so when 3 ones are added, it goes to 4 which is the top bit in this case. An other example: if n = 17, then the top bit should have a weight of 16, but 9 is enough to set a bit in the median, so the counters should be initialized to 7 (so, counter[0] = counter[1] = counter[2] = -1, the rest 0).

This approach obviously generalizes to wider bitvectors in a simple way, because all the operations on things that become wider are pointwise bit operations.

Example (just in case anyone is confused about what's going on in this algorithm)

Input:

1000
1111
1010

3 items, 2 are needed for a majority, we're counting to 3 max, so there are only 2 bits in the count (weights 1 and 2) and no bias is necessary.

init: counter = { 0000, 0000 }
put in 1000
counter = { 1000, 0000 }
put in 1111
counter = { 0111, 1000 } (the leftmost bit carried into the high counter)
put in 1010
counter = { 1101, 1010 } (the second bit from the right carried into the high counter)
result: the upper counter, so 1010

Now, in the real world, this is not how I would do it. Depending on the circumstances, I might do it like this:

Have a table (or pdep, if you have it) that maps bytes to a "spread out" version, for example 10010101b -> 0x10010101, and add those (with a normal addition), then extract the final result from the high bits of the nibbles (pext if you have it, otherwise it's trickier). The bias trick still works. Disadvantage: only works for n < 16. Despite that huge disadvantage, I've still used this one in real life (for binary puzzles, to prune the search space). Of course this still works with "more spread", which gives you a higher maximum n (for example, putting 8 counters in an uin64 gives n < 256, putting 4 in an uint64 gives n < 65536 and so on). This is really isomorphic with using an array like in LeleDumbo's answer, except the array is in a single int (in fact if you turn the spread up to eleven, it becomes exactly the same). This can also scale to very large vectors (just use several of these counters, in an array).

Or: use proper SIMD, instead of this fake SIMD. Makes extracting the bits easier (since a mask of all the top bits can be extracted). It doesn't even need the bias trick, because you can do a SIMD compare. It also makes spreading easier, since it doesn't have to be done with a lookup table - broadcast the item to all lanes, mask the appropriate bit in each lane (this conveniently ignored some details), compare and then subtract this from the count (because now it's -1 when true, not 1). For example (not tested)

vpbroadcastb xmm0, [item]   ; put 8bit item in all lanes
vpand xmm0, [lane_bit_mask] ; { 1, 2, 4, 8, ...
vpcmpeqb xmm0, [lane_bit_mask] ; -1 if the bit is set
vpsubb xmm1, xmm0           ; subtract from total

This is a bit wasteful, using only 8 of the 16 (or 32) lanes, but it shows the basic idea. It takes some more effort to unpack the bits when using more lanes.

Nice answer. FWIW, for some reason no matter what I do the _bextr_u64 operation for extracting the bits one by one always wins against masking, even when masking with avx2 operations.

LeleDumbo · Accepted Answer · 2015-03-18 03:50:48Z

1

A byte is only 8-bit. Since that's small enough, you can use descending radix sort (LSB first) strategy:

// this assumes 0 based indexing with LSB at index 0
for i := 0 to 7
  initialize bucket 0 and 1 with 0
  for j := 0 to N - 1
    increment bucket[bit i of byte j]
  bit i of resulting byte = max(bucket[0],bucket[1]) ? 0 : 1

that's O(8 * N) = O(N) time complexity, which should be fast.

P.S.: you haven't stated what should happen if both 0 and 1 occurrences are equal

edited Mar 18, 2015 at 3:50

answered Mar 17, 2015 at 14:32

LeleDumbo

9,3804 gold badges26 silver badges38 bronze badges

Collectives™ on Stack Overflow

Fastest way to get a bit-wise median byte?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related