1

The problem statement is:

Write an efficient program to count number of 1s in binary representation of an integer.

I found a post on this problem here which outlines multiple solutions which run in log(n) time including Brian Kernigan's algorithm and the gcc __builtin_popcount() method.

One solution that wasn't mentioned was the python method: bin(n).count("1") which also achieves the same effect. Does this method also run in log n time?

11
  • It runs in O(n) time, where n is the total number of bits. Commented Sep 2, 2016 at 22:20
  • Unlikely. The efficient solutions are designed specifically for counting bits, but the count() function operates on any string and can't use those optimizations. Commented Sep 2, 2016 at 22:21
  • At least O(n) time. The .count("1") has to visit each bit. Is bin(n) also linear? Commented Sep 2, 2016 at 22:21
  • @Robᵩ But there are log(n) bits in bin(n) by definition of binary representation Commented Sep 2, 2016 at 22:22
  • 1
    @loremIpsum1771 I'd discuss your thinking with the interviewer. Generally it is your thinking process that matters, not if you memorised the most efficient method of counting bits. Commented Sep 2, 2016 at 23:51

2 Answers 2

6

You are converting the integer to a string, which means it'll have to produce N '0' and '1' characters. You then use str.count() which must visit every character in the string to count the '1' characters.

All in all you have a O(N) algorithm, with a relatively high constant cost.

Note that this is the same complexity as the code you linked to; the integer n has log(n) bits, but the algorithm still has to make N = log(n) steps to calculate the number of bits. The bin(n).count('1') algorithm is thus equivalent, but slow as there is a high cost to produce the string in the first place.

At the cost of a table, you could move to processing integers per byte:

table = [0]
while len(table) < 256:
    table += [t + 1 for t in table]

length = sum(map(table.__getitem__, n.to_bytes(n.bit_length() // 8 + 1, 'little')))

However, because Python needs to produce a series of new objects (a bytes object and several integers) this method never quite is fast enough to beat the bin(n).count('1') method:

>>> from random import choice
>>> import timeit
>>> table = [0]
>>> while len(table) < 256:
...     table += [t + 1 for t in table]
...
>>> def perbyte(n): return sum(map(table.__getitem__, n.to_bytes(n.bit_length() // 8 + 1, 'little')))
...
>>> def strcount(n): return bin(n).count('1')
...
>>> n = int(''.join([choice('01') for _ in range(2 ** 16)]))
>>> for f in (strcount, perbyte):
...     print(f.__name__, timeit.timeit('f(n)', 'from __main__ import f, n', number=1000))
...
strcount 1.11822146497434
perbyte 1.4401431040023454

No matter the bit-length of the test number, perbyte is always a percentage slower.

Sign up to request clarification or add additional context in comments.

11 Comments

But there are log(n) charactes in bin(n). Thus, count will work for O(len(bin(n))) = O(log n)
@DmitryTorba: yes, I was already editing in an explanation there; the article linked is rather disingenuous in that the algorithm there is just as linear; just because the integer n has log(n) bits doesn't make the algorithm to count bits suddenly logarithmic.
So the count() method is going to be linear because the time to create and append to the string will dominate asymptotically? And also, wouldn't the algorithms in the article still be log(n) since there are log(n) bits?
@MartijnPieters In my tests, your table way is consistently significantly slower than the bin(n).count('1') way... (about factor 1.3 slower for large strings, and about factor 2.6 slower for short strings, even when I prepare the table ahead of timing).
@StefanPochmann I didn't do any performance tests myself (lack of time), so I appreciate the effort there. How large are the integers you tested with? Does switching to 'big' for endianness make any difference?
|
5

Let's say you are trying to count the number of set bits of n. On Python typical implementations, bin will compute the binary representation in O(log n) time and count will go through the string, therefore resulting in an overall O(log n) complexity.

However, note that usually, the input parameter of algorithms is the "size" of the input. When you work with integers, this corresponds to their logarithm. That's why the current algorithm is said to have a linear complexity (the variable is m = log n, and the complexity O(m)).

4 Comments

I'm not sure I understand how it would be linear. If the input size is m (meaning there are m bits) and m can be found by log n bits (where n is the decimal representation of a number), then shouldn't the runtime be log(n)?
loremlpsum171: The runtime is a O(log(n)) and it is also a O(m). When we say an algorithm is "linear", "quadratic" or "logarithmic" without precising the variable, the default one is the size of the input. Here it is m, not n. If this algorithm was said to be logarithmic, it would mean its complexity is a O(log(m)), i.e. a O(log(log(n))).
Oh ok, I think I understand what you're saying. So then it would be best just to say the algorithm is linear as to avoid misinterpretation?
loremlpsum1771: provided that the person you're talking to is aware of this rule, yes (in fact, it comes from the theoretical meaning of the time complexity of an algorithm). But a lot of people are making such mistakes today, so I think the safest way to indicate the complexity of your algorithm "in everyday's life" is to precise the meaning of your variable (e.g. "I'm counting the number of bits of n, and my algorithm has a O(log n) time complexity").

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.