0

I'm trying to store a compressed dictionary in my sqlite database. First, I convert the dict to a string using json.dumps, which seems to work fine. Storing this string in DB also works.

In the next step, I'm compressing my string using encode("zlib"). But storing the resulting string in my db throws an error.

mydict = {"house":"Haus","cat":"Katze","red":u'W\xe4yn',"dict":{"1":"asdfhgjl ahsugoh ","2":"s dhgsuoadhu gohsuohgsduohg"}}
dbCommand("create table testTable (ch1 varchar);")
# convert dictionary to string
jch1 = json.dumps(mydict,ensure_ascii=True)
print(jch1)
# store uncompressed values
dbCommand("insert into testTable (ch1) values ('%s');"%(jch1))
# compress json strings
cjch1 = jch1.encode("zlib")
print(cjch1)
# store compressed values
dbCommand("insert into testTable (ch1) values ('%s');"%(cjch1))

The first print outputs:

{"house": "Haus", "dict": {"1": "asdfhgjl ahsugoh ", "2": "s dhgsuoadhu gohsuohgsduohg"}, "red": "W\u00e4yn", "cat": "Katze"}

The second print is not readable of course:

xワフ1テPCᆵyfᅠネノ õ

Do I need to do any additional conversion before?

Looking forward to any helping hint!

1
  • 2
    "throws an error.": show the error, please. Commented Jan 22, 2015 at 15:51

2 Answers 2

3

Let's approach this from behind: why are you using gzip encoding in the first place? Do you think you need to save space in your database? Have you checked how long the dictionary strings will be in production? These strings will need to have a minimal length before compression will actually save storage space (for small input strings the output might even be larger than the input!). If that actually saves some disk space: did you think through whether the additional CPU load and processing time due to gzip encoding and decoding are worth the saved space?

Other than that: the result of gzip/zlib compression is a binary blob. In Python 2, this should be of type str. In Python 3, this should be type bytes. In any case, the database needs to know that whatever you are storing there is binary data! VARCHAR is not the right data type for this endeavor. What follows is a quote from MySQL docs:

Also, if you want to store binary values such as results from an encryption or compression function that might contain arbitrary byte values, use a BLOB column rather than a CHAR or VARCHAR column, to avoid potential problems with trailing space removal that would change data values.

The same consideration holds true for other databases. Also in case of SQLite you must use the BLOB data type (see docs) for storing binary data (if you want to ensure to get back the exact same data as you have put in before :-)).

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot, I was missing to store the data as a BLOB. To answer your question about the sizes: my data is around 200k uncompressed. As it's text I hope to be able to compress it down to less than 20k. CPU time is not a problem for my application since I want to store the data for later offline analysis.
0

Thanks a lot Jan-Philip,

you showed me the right solution. My table needs to have a BLOB entry to store the data. Here is the working code:

mydict = {"house":"Haus","cat":"Katze","red":u'W\xe4yn',"dict":{"1":"asdfhgjl ahsugoh ","2":"s dhgsuoadhu gohsuohgsduohg"}}
curs.execute("create table testTable (ch1 BLOB);")
# convert dictionary to string
jch1 = json.dumps(mydict,ensure_ascii=True)
cjch1 = jch1.encode("zlib")
# store compressed values
curs.execute('insert into testTable values (?);', [buffer(cjch1)])
db.commit()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.