how to generate a binary file

Question

i am working on a school project and this is my question:

Generate a binary file that contains the table of encoding and the data of the file using the Huffman encoding.

First i need to read the data from a file and create a Huffman tree so i created it and it is all working, but i am not able to generate the binary file because the data i have are nodes not objects so i cannot put the data in the binary file and i am getting this error:

TypeError: a bytes-like object is required, not 'node'

q = {}
a_file = open("george.txt", 'r')
for line in a_file:
    key, value = line.split()

    q[key] = value


class node:
    def __init__(self, freq, symbol, left=None, right=None):
        self.freq = freq

        self.symbol = symbol

        self.left = left

        self.right = right

        self.huff = ''


def printNodes(node, val=''):
    newVal = val + str(node.huff)
    if(node.left):
        printNodes(node.left, newVal)
    if(node.right):
        printNodes(node.right, newVal)

    if(not node.left and not node.right):
        print(f"{node.symbol} -> {newVal}")


chars = ['a', 'b', 'c', 'd', 'e', 'f']

# frequency of characters
freq = [q['a'], q['b'], q['c'], q['d'], q['e'], q['f']]

nodes = []

for x in range(len(chars)):
    nodes.append(node(freq[x], chars[x]))

while len(nodes) > 1:
    nodes = sorted(nodes, key=lambda x: x.freq)

    left = nodes[0]
    right = nodes[1]
    left.huff = 0
    right.huff = 1
    newNode = node(left.freq+right.freq, left.symbol+right.symbol, left, right)
    nodes.remove(left)
    nodes.remove(right)
    nodes.append(newNode)

printNodes(nodes[0])
with open('binary.bin', 'wb') as f:
    f.write(nodes[0])

Huffman encoding is a compression algorithm. The output file should be smaller than the input file. I doubt that's the case with the given answer. — Thomas Weller
– Thomas Weller, Commented Jan 1, 2022 at 22:23

David Parks · Accepted Answer · 2022-01-01 21:52:37Z

2

The process of converting structured objects to a binary form is called "serialization", so a search for "python serialization" is where you'd normally want to start. It's an integral part of most programming languages and comes in many forms. The defacto serialization method in python is called Pickle and is in the python package pickle.

Pickle lets you convert objects to a binary representation and vice versa, handling lots of little protocol details for you.

In your example you have:

with open('binary.bin', 'wb') as f:
    f.write(nodes[0])

You can serialize that to binary form like this:

import pickle

with open('binary.bin', 'wb') as f:
    b = pickle.dumps(nodes[0])  # bytes representation of your object
    f.write(b)                  # you can now write the bytes

You can also use shorthand methods such as the following to save all nodes in one line:

pickle.dump('binary.bin', nodes)

Deserialization looks similar:

with open('binary.bin', 'rb') as f:
    b = f.read()
    node0 = pickle.loads(b)

or

nodes = pickle.load('binary.bin')

Here are some related posts:

edited Jan 1, 2022 at 21:52

answered Jan 1, 2022 at 21:36

David Parks

32.4k48 gold badges206 silver badges366 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Thomas Weller Over a year ago

I don't think this is the answer to what OP needs. Yes, it gives a binary file, but no, it will not be a Huffman compression. Since pickle stores the Python type information in the file, the file will probably be larger than the original file and thus not be compressed.

David Parks Over a year ago

Point taken, I was not addressing the issue of huffman coding efficiency. If the OP confirms that understanding with a comment here I'll remove this answer to open it up for an answer that addresses that core issue. The OP can also uncheck the accepted answer so elicit other answers.

Collectives™ on Stack Overflow

how to generate a binary file

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related