3

So, I want to compress my JSON data using different compressor. I used this to compress the JSON.

import gzip
import JSON

with gzip.GzipFile('2.json', 'r') as isfile:
    for line in isfile:
        obj = json.loads(line)

which raises error.

raise OSError('Not a gzipped file (%r)' % magic)

OSError: Not a gzipped file (b'[\n')

I also tried direct compressing using.

zlib_data= zlib.compress(data)

which raises an error.

return lz4.block.compress(*args, **kwargs)

TypeError: a bytes-like object is required, not 'list'

So, Basically i want to compress a JSON using all the methods and to compute the time taken for the compression in different methods.

9
  • Is 2.json a gzipped file? What does ` (b'[\n')` mean in the error you state? What is data in the next attempt you mention? (The error suggests it's a list and a bytes-like object is required) Commented Jun 1, 2017 at 11:28
  • @doctorlove No, 2.json is a simple JSON file. and the next attempt data is this. with open('2.json') as json_data: data = json.load(json_data) where again 2.JSON is a simple JSON file. Commented Jun 1, 2017 at 11:30
  • Given that you compress and uncompress some JSON, could you show how you create these files? Commented Jun 1, 2017 at 11:38
  • @MSeifert This is a sample JSON file which i downloaded from internet. Should I post the structure? Thing is since it's a simple JSON(text) it should compress. what's the better way to do it? Commented Jun 1, 2017 at 11:41
  • The problem is that I can't reproduce the exception currently. So it's hard/impossible for anyone to debug this issue. Having a sample that reproduces the issue (without downloading anything from the internet) would be favorite. Commented Jun 1, 2017 at 11:43

1 Answer 1

1

On python2.7

it seems to be a problem of the type of your data

the data to compress should be a 'str' type

import gzip
import json
import lz4
import time

with gzip.GzipFile('data.gz','w') as fid_gz:
    with open('data.json','r') as fid_json:
        # get json as type dict
        json_dict = json.load(fid_json)
        # convert dict to str
        json_str = str(json_dict)
    # write string
    fid_gz.write(json_str)

# check well maded
with gzip.GzipFile('data.gz','r') as fid_gz :
    print(fid_gz.read())

even if gzip compression

gzip.zlib.compress(json_str,9)

even if lz4 compression

lz4.block.compress(json_str)

and time checking would be

# set start time
st = time.time()
# calculate elasped time
print(time.time() - st)

On python3.5

the difference between python2.7 and python 3 is the type of your data to compress

the data to compress should be a 'byte' type via bytes()

when making a .gz file

with gzip.GzipFile('data.gz','w') as fid_gz:
    with open('data.json','r') as fid_json:
        json_dict = json.load(fid_json)
        json_str = str(json_dict)
        # bytes(string, encoding)
        json_bytes = bytes(json_str,'utf8')
    fid_gz.write(json_bytes)

or just compress with gzip.compress(data, compresslevel=9)

# 'data' takes bytes
gzip.compress(json_bytes)

or just compress with zlib.compress(bytes, level=-1, /)

gzip.zlib.compress(json_bytes,9)

or just compress with lz4.bloc.compress(source, compression=0)

# 'source' takes both 'str' and 'byte'
lz4.block.compress(json_str)
lz4.block.compress(json_bytes)

the measuring time is on your intention.

cheers

Sign up to request clarification or add additional context in comments.

6 Comments

what is '9' in the gzip.zlib.compress(json_str,9) i tried that and it still gives the error. gzip.zlib.compress(json_str,9) TypeError: a bytes-like object is required, not 'str'
it's compression level, gzip.zlib.compress() compress(string[, level]) -- Returned compressed string but on python2.7, I'm trying python3 now
so why is it giving error. it seems to work with lz4.block.compress(json_str) . look at the error once. Is it compulsory to convert it in string before compression? As i am trying to compare the time taken to compress. it must be taking some time to convert it into dictionary and then string.
yeah compression works. but the thing is, it's absolute compulsory to convert the JSON as the output type is also bytes. And second can't I measure the execution time of just gzip.compress(json_bytes) ?. Your answer helped a lot thanks.
st = time.time() gzip.compress(json_bytes) print(time.time()-st) this may help you
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.