45

Is there a way to write a string directly to a tarfile? From http://docs.python.org/library/tarfile.html it looks like only files already written to the file system can be added.

7 Answers 7

40

I would say it's possible, by playing with TarInfo and TarFile.addfile passing a StringIO as a fileobject.

Very rough, but it works

import tarfile
import StringIO

tar = tarfile.TarFile("test.tar","w")

string = StringIO.StringIO()
string.write("hello")
string.seek(0)
info = tarfile.TarInfo(name="foo")
info.size=len(string.buf)
tar.addfile(tarinfo=info, fileobj=string)

tar.close()
Sign up to request clarification or add additional context in comments.

3 Comments

You can just say StringIO.StringIO("hello") to replace the writing and seeking.
is the procedure similar to python3 and bytesIO objects?
@proteneer: I believe in python 3 the seek method gives you a binary length, while it internally uses the string len() function, so that tarfile.copyfileobj function will fail with raise OSError("end of file reached")
16

As Stefano pointed out, you can use TarFile.addfile and StringIO.

import tarfile, StringIO

data = 'hello, world!'

tarinfo = tarfile.TarInfo('test.txt')
tarinfo.size = len(data)

tar = tarfile.open('test.tar', 'a')
tar.addfile(tarinfo, StringIO.StringIO(data))
tar.close()

You'll probably want to fill other fields of tarinfo (e.g. mtime, uname etc.) as well.

3 Comments

is the "As Stefano pointed out" an edit? Otherwise, I don't see what you're doing differently. Thanks for the response all the same.
I think Stefano haven't had any code posted at the time I wrote my response, he only noted that TarFile.addfile and StringIO can be used. My memory is little blurred, though.
FWIW, yes, @Stefano's detailed information was added in an edit after you wrote this. The other answer saying the same thing also came in almost simultaneously.
12

I found this looking how to serve in Django a just created in memory .tgz archive, may be somebody else will find my code usefull:

import tarfile
from io import BytesIO


def serve_file(request):
    out = BytesIO()
    tar = tarfile.open(mode = "w:gz", fileobj = out)
    data = 'lala'.encode('utf-8')
    file = BytesIO(data)
    info = tarfile.TarInfo(name="1.txt")
    info.size = len(data)
    tar.addfile(tarinfo=info, fileobj=file)
    tar.close()

    response = HttpResponse(out.getvalue(), content_type='application/tgz')
    response['Content-Disposition'] = 'attachment; filename=myfile.tgz'
    return response

Comments

9

The solution in Python 3 uses io.BytesIO. Be sure to set TarInfo.size to the length of the bytes, not the length of the string.

Given a single string, the simplest solution is to call .encode() on it to obtain bytes. In this day and age you probably want UTF-8, but if the recipient is expecting a specific encoding, such as ASCII (i.e. no multi-byte characters), then use that instead.

import io
import tarfile

data = 'hello\n'.encode('utf8')
info = tarfile.TarInfo(name='foo.txt')
info.size = len(data)

with tarfile.TarFile('test.tar', 'w') as tar:
    tar.addfile(info, io.BytesIO(data))

If you really need a writable string buffer, similar to the accepted answer by @Stefano Borini for Python 2, then the solution is to use io.TextIOWrapper over an underlying io.BytesIO buffer.

import io
import tarfile

textIO = io.TextIOWrapper(io.BytesIO(), encoding='utf8')
textIO.write('hello\n')
bytesIO = textIO.detach()
info = tarfile.TarInfo(name='foo.txt')
info.size = bytesIO.tell()

with tarfile.TarFile('test.tar', 'w') as tar:
    bytesIO.seek(0)
    tar.addfile(info, bytesIO)

1 Comment

You can encode without specifying utf8, it's the default: data = 'hello\n'.encode()
3

In my case I wanted to read from an existing tar file, append some data to the contents, and write it to a new file. Something like:

for ti in tar_in:
    buf_in = tar.extractfile(ti)
    buf_out = io.BytesIO()
    size = buf_out.write(buf_in.read())
    size += buf_out.write(other data)
    buf_out.seek(0)
    ti.size = size
    tar_out.addfile(ti, fileobj=buf_out)

Extra code is needed for handling directories and links.

Comments

3

Just for the record:
StringIO objects have a .len property.
No need to seek(0) and do len(foo.buf)
No need to keep the entire string around to do len() on, or God forbid, do the accounting yourself.

( Maybe it did not at the time the OP was written. )

2 Comments

StringIO objects do not have a len property. The code StringIO('foo').len raises an exception AttributeError: '_io.StringIO' object has no attribute 'len' in Python 3.8. (Maybe it did not at the time the answer was written.)
Apparently its undocumented but present in StringIO in 2.7 (but not cStringIO) stackoverflow.com/questions/4677433/…
2

You have to use TarInfo objects and the addfile method instead of the usual add method:

from StringIO import StringIO
from tarfile import open, TarInfo

s = "Hello World!"
ti = TarInfo("test.txt")
ti.size = len(s)

tf = open("testtar.tar", "w")
tf.addfile(ti, StringIO(s))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.