TEXT compression in python

Question

I have this text :

2,3,5,1,13,7,17,11,89,1,233,29,61,47,1597,19,37,41,421,199,28657,23,3001,521,53,281,514229,31,557,2207,19801,3571,141961,107,73,9349,135721,2161,2789,211,433494437,43,109441,139,2971215073,1103,97,101,6376021,90481,953,5779,661,14503,797,59,353,2521,4513,3010349,35239681,1087,14736206161,9901,269,67,137,71,6673,103681,9375829,54018521,230686501,29134601,988681,79,157,1601,2269,370248451,99194853094755497,83,9521,6709,173,263,1069,181,741469,4969,4531100550901,6643838879,761,769,193,599786069,197,401,743519377,919,519121,103,8288823481,119218851371,1247833,11128427,827728777,331,1459000305513721,10745088481,677,229,1381,347,29717,709,159512939815855788121,

This are numbers generated from my generator program,now the problem has a source code limit so I can't use the above texts in my solution so I want to compress this and put it into a data-structure in python so that I can print them by indexing like:

F = [`compressed data`]

and F[0] would give 2 F[5] would give 7 like this ... Please suggest me a suitable compression technique.

PS: I am a very newbie to python so please explain your method.

I don't see any compression here. Are you sure that's the word you mean? — Ned Batchelder
– Ned Batchelder, Commented Jan 30, 2011 at 19:20
What's the size of your number list? How fast do you need to get number for an index? Does this number list has any boundaries, properties, characteristics or any other information about number sequence? You say that you have source code limit. What is it? Do you have any memory limit? There are various compressing algorithms and the right choice depends on your restrictions and available information. — ikostia
– ikostia, Commented Jan 30, 2011 at 19:42
Given a value of N I have output the value of F[N] now the initialization of F[] should be such that F = [ 2,3,5,1,13,7,17,11,89,1,233,...] but instead of numbers I have use the compressed value so that the overall source code limit suffices. — Quixotic
– Quixotic, Commented Jan 30, 2011 at 19:57
@Tretwick Marian: Can you elaborate more what you mean by the problem has a source code limit and can't use the above texts in my solution. Are you participating to some kind of coding competition? Btw have you considered to just save the 'text' to a file and read it later when needed to a list? — eat
– eat, Commented Jan 30, 2011 at 20:29

Lennart Regebro · Accepted Answer · 2011-01-30 20:39:30Z

10

Sure you can do this:

import base64
import zlib
compressed = 'eJwdktkNgDAMQxfqR+5j/8V4QUJQUttx3Nrzl0+f+uunPPpm+Tf3Z/tKX1DM5bXP+wUFA777bCob4HMRfUk14QwfDYPrrA5gcuQB49lQQxdZpdr+1oN2bEA3pW5Nf8NGOFsR19NBszyX7G2raQpkVUEBdbTLuwSRlcDCYiW7GeBaRYJrgImrM3lmI/WsIxFXNd+aszXoRXuZ1PnZRdwKJeqYYYKq6y1++PXOYdgM0TlZcymCOdKqR7HYmYPiRslDr2Sn6C0Wgw+a6MakM2VnBk6HwU6uWqDRz+p6wtKTCg2WsfdKJwfJlHNaFT4+Q7PGfR9hyWK3p3464nhFwpOd7kdvjmz1jpWcxmbG/FJUXdMZgrpzs+jxC11twrBo3TaNgvsf8oqIYwT4r9XkPnNC1XcP7qD5cW7UHSJZ3my5qba+ozncl5kz8gGEEYOQ'
data = zlib.decompress(base64.b64decode(compressed))

Note that this is only 139 characters shorter. But it works:

>>> data
'2,3,5,1,13,7,17,11,89,1,233,29,61,47,1597,19,37,41,421,199,28657,23,3001,521,53,281,514229,31,557,2207,19801,3571,141961,107,73,9349,135721,2161,2789,211,433494437,43,109441,139,2971215073,1103,97,101,6376021,90481,953,5779,661,14503,797,59,353,2521,4513,3010349,35239681,1087,14736206161,9901,269,67,137,71,6673,103681,9375829,54018521,230686501,29134601,988681,79,157,1601,2269,370248451,99194853094755497,83,9521,6709,173,263,1069,181,741469,4969,4531100550901,6643838879,761,769,193,599786069,197,401,743519377,919,519121,103,8288823481,119218851371,1247833,11128427,827728777,331,1459000305513721,10745088481,677,229,1381,347,29717,709,159512939815855788121,'

If your code limit really is so short, maybe you are supposed to calculate this data or something? What is it?

edited Jan 30, 2011 at 20:39

answered Jan 30, 2011 at 20:31

Lennart Regebro

173k45 gold badges230 silver badges254 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Quixotic Over a year ago

And how did you get the compressed value programatically ? :)

Lennart Regebro Over a year ago

I did the same thing, but in reverse.

Quixotic Over a year ago

Okay let me try to rephrase :) I want to know how you obtain the compressed value in that format ? since something like this ideone.com/EDftR is not giving me that value.

Lennart Regebro Over a year ago

Yeah, I said reverse. You obviously have to reverse the order of the actions, ie base64.b64encode(zlib.compress(s))

Quixotic Over a year ago

I have up-voted this one :) Now I understand both of the solutions :)

|

David Heffernan · Accepted Answer · 2011-01-30 19:20:25Z

5

zlib would get the job done, if you indeed want compression. If you don't want compression, then I'm afraid that my mind-reading skills are on the wane.

answered Jan 30, 2011 at 19:20

David Heffernan

616k46 gold badges1.1k silver badges1.5k bronze badges

11 Comments

9000 Over a year ago

gzip + base64 may indeed have smaller size than the source text. I just tried to do that with the digits presented, and it compressed the text from 663 to 475 bytes. Not stellar, though.

Quixotic Over a year ago

I guess this is what I am looking for but I am new to pyth so could please explain the compression and decompression technique ?

David Heffernan Over a year ago

@Tretwick Compressing makes it take up less space. zlib is lossless compression so no information is lost. Decompression is the inverse operation. Did you read the link I included in my answer?

Quixotic Over a year ago

Yes I did,say I want to compress a text 'My name is Tretwick' hence I write zlib.compress('My name is Tretwick') but then I have to print it and then to get the compressed data back to get the original I have to use zlib.decompress() but when I print it it give me some different things which is not working if I copy paste into the decompress module. I hope you get my point.

David Heffernan Over a year ago

@Tretwick I guess you're doing it wrong somehow, but I don't know off the top of my head. The simple compress/decompress cycle you propose works fine for me.

|

jfs · Accepted Answer · 2011-01-31 10:02:37Z

On Python 2.4-2.7, pypy, jython:

>>> enc = sdata.encode('zlib').encode('base64')
>>> print enc
eJwdktkNgDAMQxfqR+5j/8V4QUJQUttx3Nrzl0+f+uunPPpm+Tf3Z/tKX1DM5bXP+wUFA777bCob
4HMRfUk14QwfDYPrrA5gcuQB49lQQxdZpdr+1oN2bEA3pW5Nf8NGOFsR19NBszyX7G2raQpkVUEB
dbTLuwSRlcDCYiW7GeBaRYJrgImrM3lmI/WsIxFXNd+aszXoRXuZ1PnZRdwKJeqYYYKq6y1++PXO
YdgM0TlZcymCOdKqR7HYmYPiRslDr2Sn6C0Wgw+a6MakM2VnBk6HwU6uWqDRz+p6wtKTCg2WsfdK
JwfJlHNaFT4+Q7PGfR9hyWK3p3464nhFwpOd7kdvjmz1jpWcxmbG/FJUXdMZgrpzs+jxC11twrBo
3TaNgvsf8oqIYwT4r9XkPnNC1XcP7qD5cW7UHSJZ3my5qba+ozncl5kz8gGEEYOQ
>>> print enc.decode('base64').decode('zlib')[:79]
2,3,5,1,13,7,17,11,89,1,233,29,61,47,1597,19,37,41,421,199,28657,23,3001,521,53
>>> sdata == enc.decode('base64').decode('zlib')
True
>>> F = [int(s) for s in sdata.split(',') if s.strip()]
>>> F[0], F[5]
(2, 7)

Collectives™ on Stack Overflow

TEXT compression in python

3 Answers 3

6 Comments

11 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

11 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related