Separate binary data (blobs) in csv files

Question

Is there any safe way of mixing binary with text data in a (pseudo)csv file?

One naive and partial solution would be:

using a compound field separator, made of more than one character (e.g. the \a\b sequence for example)
saving each field as either text or as binary data would require the parser of the pseudocsv to look for the \a\b sequence and read the data between separators according to a known rule (e.g. by the means of a known header with field name and field type, for example)

The core issue is that binary data is not guaranteed to not contain the \a\b sequence somewhere inside its body, before the actual end of the data.

The proper solution would be to save the individual blob fields in their own separate physical files and only include the filenames in a .csv, but this is not acceptable in this scenario.

Is there any proper and safe solution, either already implemented or applicable given these restrictions?

A simple solution is to store binary data base64 encoded.

Tiger-222
– Tiger-222

2016-08-31 15:14:23 +00:00
Commented Aug 31, 2016 at 15:14 — Tiger-222
– Tiger-222, Commented Aug 31, 2016 at 15:14

jsbueno · Accepted Answer · 2016-08-31 15:21:40Z

2

If you need everything in a single file, just use one of the methods to encode binary as printable ASCII, and add that results to the CSV vfieds (letting the CSV module add and escape quotes as needed).

One such method is base64 - but even on Python's base64 codec, there are more efficient codecs like base85 (on newer Pythons, version 3.4 and above, I guess).

So, an example in Python 2.7 would be:

import csv, base64

import random
data = b''.join(chr(random.randrange(0,256)) for i in range(50))

writer = csv.writer(open("testfile.csv", "wt"))
writer.writerow(["some text", base64.b64encode(data)])

Of course, you have to do the proper base64 decoding on reading the file as well - but it is certainly better than trying to create an ad-hoc escaping method.

edited Aug 31, 2016 at 15:21

answered Aug 31, 2016 at 15:14

jsbueno

114k11 gold badges159 silver badges239 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

teodron Over a year ago

I really did not think of the base64 encoding! That would make even classic .csv separators usable!

Collectives™ on Stack Overflow

Separate binary data (blobs) in csv files

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related