21

I know how to read bytes — x.read(number_of_bytes), but how can I read bits in Python?

I have to read only 5 bits (not 8 bits [1 byte]) from a binary file

Any ideas or approach?

1
  • 1
    Are those bits consecutive? If so, the five most significant bits, or five least significant bits in the byte? Commented May 21, 2012 at 17:30

3 Answers 3

37

Python can only read a byte at a time. You'd need to read in a full byte, then just extract the value you want from that byte, e.g.

b = x.read(1)
firstfivebits = b >> 3

Or if you wanted the 5 least significant bits, rather than the 5 most significant bits:

b = x.read(1)
lastfivebits = b & 0b11111

Some other useful bit manipulation info can be found here: http://wiki.python.org/moin/BitManipulation

Sign up to request clarification or add additional context in comments.

3 Comments

when my reputations grows to 15, I'll give you thumbs up! (I'm new here) so, if I do this: b = x.read(1) firstfivebits = b >> 3 I'll get the first 5 bits... why not firstfivebits = b >> 5? y mean... why b >> 3?
@HugoMedina if you don't know why firstfivebits = b >> 3 you sure you should be fiddlin' with bits? (You might go blind or something ;).
now I get it, since 1 byte = 8 bits we'll apply right-shift operator 3 (like deleting those 3 least significant bits) so we'll get the remaining 5 bits in the byte
5

As the accepted answer states, standard Python I/O can only read and write whole byte(s) at a time. However you can simulate such a stream of bits using this recipe for Bitwise I/O.

Updates

After modifying the Rosetta Code's Python version to work in unchanged in both Python 2 & 3, I incorporated those changes into this answer.

In addition to that, later, after being inspired by a comment made by @mhernandez, I further modified the Rosetta Code so it supports what's called the context manager protocol which allows instances of both of its two classes to be used in Python with statements. Latest version is shown below:

class BitWriter(object):
    def __init__(self, f):
        self.accumulator = 0
        self.bcount = 0
        self.out = f

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.flush()

    def __del__(self):
        try:
            self.flush()
        except ValueError:   # I/O operation on closed file.
            pass

    def _writebit(self, bit):
        if self.bcount == 8:
            self.flush()
        if bit > 0:
            self.accumulator |= 1 << 7-self.bcount
        self.bcount += 1

    def writebits(self, bits, n):
        while n > 0:
            self._writebit(bits & 1 << n-1)
            n -= 1

    def flush(self):
        self.out.write(bytearray([self.accumulator]))
        self.accumulator = 0
        self.bcount = 0


class BitReader(object):
    def __init__(self, f):
        self.input = f
        self.accumulator = 0
        self.bcount = 0
        self.read = 0

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        pass

    def _readbit(self):
        if not self.bcount:
            a = self.input.read(1)
            if a:
                self.accumulator = ord(a)
            self.bcount = 8
            self.read = len(a)
        rv = (self.accumulator & (1 << self.bcount-1)) >> self.bcount-1
        self.bcount -= 1
        return rv

    def readbits(self, n):
        v = 0
        while n > 0:
            v = (v << 1) | self._readbit()
            n -= 1
        return v

if __name__ == '__main__':
    import os
    import sys
    # Determine this module's name from it's file name and import it.
    module_name = os.path.splitext(os.path.basename(__file__))[0]
    bitio = __import__(module_name)

    with open('bitio_test.dat', 'wb') as outfile:
        with bitio.BitWriter(outfile) as writer:
            chars = '12345abcde'
            for ch in chars:
                writer.writebits(ord(ch), 7)

    with open('bitio_test.dat', 'rb') as infile:
        with bitio.BitReader(infile) as reader:
            chars = []
            while True:
                x = reader.readbits(7)
                if not reader.read:  # End-of-file?
                    break
                chars.append(chr(x))
            print(''.join(chars))

Another usage example showing how to "crunch" an 8-bit byte ASCII stream discarding the most significant "unused" bit...and read it back (however neither use it as a context manger).

import sys
import bitio

o = bitio.BitWriter(sys.stdout)
c = sys.stdin.read(1)
while len(c) > 0:
    o.writebits(ord(c), 7)
    c = sys.stdin.read(1)
o.flush()

...and to "decrunch" the same stream:

import sys
import bitio

r = bitio.BitReader(sys.stdin)
while True:
    x = r.readbits(7)
    if not r.read:  # nothing read
        break
    sys.stdout.write(chr(x))

4 Comments

+1 for the self-contained snippet. Note that the main may not read what it's meant to because the writer may not be deleted when the reader attempts reading. A call to writer.flush() solves it.
@mhernandez: Extending the bitio classes so they support the context manager protocol like the built-in file class does would probably be a very worthwhile endeavor—and an even better way to take care of the issue.
Agreed, in fact that's exactly what I did. Thank you sir
mhernandez: Glad to hear it helped. BTW I recently modified the Rosetta Code's Python version so it also supports the context manager protocol—and then updated my answer here accordingly. (It was done in that order because Rosetta Code's license on allows verbatim copies in a context like this.)
4

This appears at the top of a Google search for reading bits using python.

I found bitstring to be a good package for reading bits and also an improvement over the native capability (which isn't bad for Python 3.6) e.g.

# import module
from bitstring import ConstBitStream

# read file
b = ConstBitStream(filename='file.bin')

# read 5 bits
output = b.read(5)

# convert to unsigned int
integer_value = output.uint

More documentation and details here: https://pythonhosted.org/bitstring/index.html

1 Comment

I agree that bitstring is helpful. When you need to read in more than 8 bits at once, you need to understand how the bits are "scattered" over the bytes. E.g. I needed to read in a 14-bit integer. This is how I succeeded: buf1 = b.read(8); buf2 = b.read(2); buf3 = b.read(6); str_with_bits = str(buf3.bin) + str(buf1.bin); int_value = int(str_with_bits, 2);

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.