1

i am working on a school project and this is my question:

Generate a binary file that contains the table of encoding and the data of the file using the Huffman encoding.

First i need to read the data from a file and create a Huffman tree so i created it and it is all working, but i am not able to generate the binary file because the data i have are nodes not objects so i cannot put the data in the binary file and i am getting this error:

TypeError: a bytes-like object is required, not 'node'

q = {}
a_file = open("george.txt", 'r')
for line in a_file:
    key, value = line.split()

    q[key] = value


class node:
    def __init__(self, freq, symbol, left=None, right=None):
        self.freq = freq

        self.symbol = symbol

        self.left = left

        self.right = right

        self.huff = ''


def printNodes(node, val=''):
    newVal = val + str(node.huff)
    if(node.left):
        printNodes(node.left, newVal)
    if(node.right):
        printNodes(node.right, newVal)

    if(not node.left and not node.right):
        print(f"{node.symbol} -> {newVal}")


chars = ['a', 'b', 'c', 'd', 'e', 'f']

# frequency of characters
freq = [q['a'], q['b'], q['c'], q['d'], q['e'], q['f']]

nodes = []

for x in range(len(chars)):
    nodes.append(node(freq[x], chars[x]))

while len(nodes) > 1:
    nodes = sorted(nodes, key=lambda x: x.freq)

    left = nodes[0]
    right = nodes[1]
    left.huff = 0
    right.huff = 1
    newNode = node(left.freq+right.freq, left.symbol+right.symbol, left, right)
    nodes.remove(left)
    nodes.remove(right)
    nodes.append(newNode)

printNodes(nodes[0])
with open('binary.bin', 'wb') as f:
    f.write(nodes[0])
1
  • Huffman encoding is a compression algorithm. The output file should be smaller than the input file. I doubt that's the case with the given answer. Commented Jan 1, 2022 at 22:23

1 Answer 1

2

The process of converting structured objects to a binary form is called "serialization", so a search for "python serialization" is where you'd normally want to start. It's an integral part of most programming languages and comes in many forms. The defacto serialization method in python is called Pickle and is in the python package pickle.

Pickle lets you convert objects to a binary representation and vice versa, handling lots of little protocol details for you.

In your example you have:

with open('binary.bin', 'wb') as f:
    f.write(nodes[0])

You can serialize that to binary form like this:

import pickle

with open('binary.bin', 'wb') as f:
    b = pickle.dumps(nodes[0])  # bytes representation of your object
    f.write(b)                  # you can now write the bytes

You can also use shorthand methods such as the following to save all nodes in one line:

pickle.dump('binary.bin', nodes)

Deserialization looks similar:

with open('binary.bin', 'rb') as f:
    b = f.read()
    node0 = pickle.loads(b)

or

nodes = pickle.load('binary.bin')

Here are some related posts:

Sign up to request clarification or add additional context in comments.

2 Comments

I don't think this is the answer to what OP needs. Yes, it gives a binary file, but no, it will not be a Huffman compression. Since pickle stores the Python type information in the file, the file will probably be larger than the original file and thus not be compressed.
Point taken, I was not addressing the issue of huffman coding efficiency. If the OP confirms that understanding with a comment here I'll remove this answer to open it up for an answer that addresses that core issue. The OP can also uncheck the accepted answer so elicit other answers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.