2

I'm trying to gzip a numpy array in Python 3.6.8.

If I run this snippet twice (different interpreter sessions), I get different output:

import gzip
import numpy
import base64

data = numpy.array([[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0], [13.0, 14.0, 15.0, 16.0]])
compressed = base64.standard_b64encode(gzip.compress(data.data, compresslevel=9))
print(compressed.decode('ascii'))

Example results (it's different every time):

H4sIAPjHiV4C/2NgAIEP9gwQ4AChOKC0AJQWgdISUFoGSitAaSUorQKl1aC0BpTWgtI6UFoPShs4AABmfqWAgAAAAA==
H4sIAPrHiV4C/2NgAIEP9gwQ4AChOKC0AJQWgdISUFoGSitAaSUorQKl1aC0BpTWgtI6UFoPShs4AABmfqWAgAAAAA==
      ^

Running it in a loop (so the same interpreter session),it gives the same result each time

for _ in range(1000):
    assert compressed == base64.standard_b64encode(gzip.compress(data.data, compresslevel=9))

How can I get the same result each time? (Preferably without external libraries.)

3
  • Why does that matter to you? Commented Apr 5, 2020 at 12:14
  • 1
    What happens when you add a mtime=0 parameter to gzip.compress(data.data, compresslevel=9)? Commented Apr 5, 2020 at 12:27
  • @SinanKurmus Doesn't exist, but if I create my own BytesIO and GzipFile with mtime then it indeed works! Do you want to turn it into an answer? Commented Apr 5, 2020 at 12:41

1 Answer 1

3

Gzip uses some file information (inodes, timestamp, etc) when compressing (good discussion of that here). You are not using files per se but still you are doing it at different times. So that may have an effect (a look at Python's gzip wrapper would actually give a better insight but that is beyond me:)

So try using the mtime=0 parameter in gzip.compress(data.data, compresslevel=9) if you have Python 3.8+, as

gzip.compress(data.data, compresslevel=9, mtime=0)

and if that does not work (e.g. older Python version), then you can use gzip.GzipFile with the mtime parameter, like this:

buf = io.BytesIO()
with GzipFile(fileobj=buf, mode='wb', compresslevel=compresslevel, mtime=0) as f:
    f.write(data)
result = buf.getvalue()

For details, the documentation is here:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.