There is a bit-twiddling way, but it's not that pretty (nor particularly fast). But I'll explain it anyway, since you asked, and it's interesting.
The idea is that instead of holding the count for each bit position in an integer, hold the bits of the counters in integers. So if you view the counters as a boolean matrix with each row being a counter, store the columns in integers. This way, putting in a numbers is a weird kind of addition, instead of "as many increments as there are bits". Like this (not tested) (adding in c)
for (int i = 0; i < counter_bits; i++) {
counter[i] ^= c;
c &= ~counter[i]; // counter & c == ~(counter ^ c) & c
}
And now the fun part: what is the median? Well, if n (the number of items) is a power of two minus one and counter is an array that is exactly long enough to express n "vertically", then the median is exactly the value of the "upper bit" counter (there a 1 will appear whenever a bit has been seen at least (n+1)/2 times).
The more interesting case is "otherwise". It's still fixable, all that's necessary to make it work out nicely again is to initialize the counters with a bias, such that the highest bit becomes set exactly at the right moment. For example, if n = 5, then the counters should be initialized to 1 (vertically 1, so counter[0] = -1, all other counters are 0), so when 3 ones are added, it goes to 4 which is the top bit in this case. An other example: if n = 17, then the top bit should have a weight of 16, but 9 is enough to set a bit in the median, so the counters should be initialized to 7 (so, counter[0] = counter[1] = counter[2] = -1, the rest 0).
This approach obviously generalizes to wider bitvectors in a simple way, because all the operations on things that become wider are pointwise bit operations.
Example (just in case anyone is confused about what's going on in this algorithm)
Input:
1000
1111
1010
3 items, 2 are needed for a majority, we're counting to 3 max, so there are only 2 bits in the count (weights 1 and 2) and no bias is necessary.
init: counter = { 0000, 0000 }
put in 1000
counter = { 1000, 0000 }
put in 1111
counter = { 0111, 1000 } (the leftmost bit carried into the high counter)
put in 1010
counter = { 1101, 1010 } (the second bit from the right carried into the high counter)
result: the upper counter, so 1010
Now, in the real world, this is not how I would do it. Depending on the circumstances, I might do it like this:
Have a table (or pdep, if you have it) that maps bytes to a "spread out" version, for example 10010101b -> 0x10010101, and add those (with a normal addition), then extract the final result from the high bits of the nibbles (pext if you have it, otherwise it's trickier). The bias trick still works. Disadvantage: only works for n < 16. Despite that huge disadvantage, I've still used this one in real life (for binary puzzles, to prune the search space). Of course this still works with "more spread", which gives you a higher maximum n (for example, putting 8 counters in an uin64 gives n < 256, putting 4 in an uint64 gives n < 65536 and so on). This is really isomorphic with using an array like in LeleDumbo's answer, except the array is in a single int (in fact if you turn the spread up to eleven, it becomes exactly the same). This can also scale to very large vectors (just use several of these counters, in an array).
Or: use proper SIMD, instead of this fake SIMD. Makes extracting the bits easier (since a mask of all the top bits can be extracted). It doesn't even need the bias trick, because you can do a SIMD compare. It also makes spreading easier, since it doesn't have to be done with a lookup table - broadcast the item to all lanes, mask the appropriate bit in each lane (this conveniently ignored some details), compare and then subtract this from the count (because now it's -1 when true, not 1). For example (not tested)
vpbroadcastb xmm0, [item] ; put 8bit item in all lanes
vpand xmm0, [lane_bit_mask] ; { 1, 2, 4, 8, ...
vpcmpeqb xmm0, [lane_bit_mask] ; -1 if the bit is set
vpsubb xmm1, xmm0 ; subtract from total
This is a bit wasteful, using only 8 of the 16 (or 32) lanes, but it shows the basic idea. It takes some more effort to unpack the bits when using more lanes.