0

Until know I used this code for reading zip files:

 try:
        with open("asset.zip", "rb") as f:
            bytes_of_file = f.read()
            encoded = base64.b64encode(bytes_of_file)

And it works great then I tried to use large zip files (1GB +), and I got memory error. I tried to use some solution that I saw over the internet:

 with zipfile.ZipFile("asset.zip", "rb") as z:
            with z.open(...) as f:
                 bytes_of_file = f.read()
                 encoded = base64.b64encode(bytes_of_file)

But the problem that zipfile have to open some file inside the zip, and only then I can read it. I want to read the zip file itself and encode it. How can I do it?

Thanks!

4
  • Looking this thread stackoverflow.com/questions/25962114/… Commented Aug 12, 2020 at 15:22
  • 2
    Where is the base64-encoded zip file going? If the file itself doesn't fit in memory, the base64-encoded version of that same file (which is 40% bigger) will not fit either. You can write it to a file, or network connection, in chunks, but not keep it in memory all at the same time. Commented Aug 12, 2020 at 15:27
  • Hi @Thomas, the code is crashing on the read() method. I didn't think about the next step, but writing to file is a good idea, I just need to read the zip first. Commented Aug 12, 2020 at 15:31
  • 1
    Must it be done in Python? On my Linux system, I can simply do base64 asset.zip > asset.zip.b64 on the command line. Commented Aug 12, 2020 at 15:33

1 Answer 1

1

If the file is too large to fit in memory, you will need to stream it little by little to your output file. Open the input file for reading and the output file for writing (both in binary mode). Then read a chunk of some fixed size from the input file, encode it, and write it to the output.

The trick is to choose your chunk size correctly, otherwise base64 will add padding (= characters) at the end of the output chunk which are normally only valid at the end of a base64 encoded byte string. 4 * 6 bits = 24 bits = 3 bytes of input are encoded as 4 full bytes of output without padding, so your chunk size must be a multiple of 3, for example 3 * 1024 * 1024 bytes = 3 MiB.

Sign up to request clarification or add additional context in comments.

1 Comment

Thomas is correct. Check out this old post: stackoverflow.com/questions/17220370/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.