15

What is the easiest way to save and load data in python, preferably in a human-readable output format?

The data I am saving/loading consists of two vectors of floats. Ideally, these vectors would be named in the file (e.g. X and Y).

My current save() and load() functions use file.readline(), file.write() and string-to-float conversion. There must be something better.

7 Answers 7

27

The most simple way to get a human-readable output is by using a serialisation format such a JSON. Python contains a json library you can use to serialise data to and from a string. Like pickle, you can use this with an IO object to write it to a file.

import json

file = open('/usr/data/application/json-dump.json', 'w+')
data = { "x": 12153535.232321, "y": 35234531.232322 }

json.dump(data, file)

If you want to get a simple string back instead of dumping it to a file, you can use json.dumps() instead:

import json
print json.dumps({ "x": 12153535.232321, "y": 35234531.232322 })

Reading back from a file is just as easy:

import json

file = open('/usr/data/application/json-dump.json', 'r')
print json.load(file)

The json library is full-featured, so I'd recommend checking out the documentation to see what sorts of things you can do with it.

Sign up to request clarification or add additional context in comments.

Comments

9

There are several options -- I don't exactly know what you like. If the two vectors have the same length, you could use numpy.savetxt() to save your vectors, say x and y, as columns:

# saving:
f = open("data", "w")
f.write("# x y\n")        # column names
numpy.savetxt(f, numpy.array([x, y]).T)
# loading:
x, y = numpy.loadtxt("data", unpack=True)

If you are dealing with larger vectors of floats, you should probably use NumPy anyway.

5 Comments

it seems to me numpy.savetxt takes a string as its first argument.
@nos: It takes a file name or a file object. Might be that old versions of NumPy only accepted a string. I pass a file object to be able to write the header line before.
it saves if i use f = open("data", "w+")
You can add the header line directly in numpy.savetxt thanks to an optional argument, as in numpy.savetxt("data", numpy.array([x, y]).T, header="x y") (the comment symbol and newline are automatically added to the header line). Thanks to that, you don't need to do any extra file handling any more and it's all in one line.
@Dalker Yep, these days the code can be simplified, but savetxt() didn't have this option at the time.
8
  • If it should be human-readable, I'd also go with JSON. Unless you need to exchange it with enterprise-type people, they like XML better. :-)

  • If it should be human editable and isn't too complex, I'd probably go with some sort of INI-like format, like for example configparser.

  • If it is complex, and doesn't need to be exchanged, I'd go with just pickling the data, unless it's very complex, in which case I'd use ZODB.

  • If it's a LOT of data, and needs to be exchanged, I'd use SQL.

That pretty much covers it, I think.

2 Comments

Does CSV fit any of these requirements?
CSV is a good format if your data is very simple (ie tabular), and you want to be able to import it in a spreadsheet. Otherwise it's too undefined, not easily readable or editable, can't handle complex data, and is a pain if the dataset is big.
2

A simple serialization format that is easy for both humans to computers read is JSON.

You can use the json Python module.

Comments

1

Here is an example of Encoder until you probably want to write for Body class:

# add this to your code
class BodyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        if hasattr(obj, '__jsonencode__'):
            return obj.__jsonencode__()
        if isinstance(obj, set):
            return list(obj)
        return obj.__dict__

    # Here you construct your way to dump your data for each instance
    # you need to customize this function
    def deserialize(data):
        bodies = [Body(d["name"],d["mass"],np.array(d["p"]),np.array(d["v"])) for d in data["bodies"]]
        axis_range = data["axis_range"]
        timescale = data["timescale"]
        return bodies, axis_range, timescale

    # Here you construct your way to load your data for each instance
    # you need to customize this function
    def serialize(data):
        file = open(FILE_NAME, 'w+')
        json.dump(data, file, cls=BodyEncoder, indent=4)
        print("Dumping Parameters of the Latest Run")
        print(json.dumps(data, cls=BodyEncoder, indent=4))

Here is an example of the class I want to serialize:

class Body(object):
    # you do not need to change your class structure
    def __init__(self, name, mass, p, v=(0.0, 0.0, 0.0)):
        # init variables like normal
        self.name = name
        self.mass = mass
        self.p = p
        self.v = v
        self.f = np.array([0.0, 0.0, 0.0])

    def attraction(self, other):
        # not important functions that I wrote...

Here is how to serialize:

# you need to customize this function
def serialize_everything():
    bodies, axis_range, timescale = generate_data_to_serialize()

    data = {"bodies": bodies, "axis_range": axis_range, "timescale": timescale}
    BodyEncoder.serialize(data)

Here is how to dump:

def dump_everything():
    data = json.loads(open(FILE_NAME, "r").read())
    return BodyEncoder.deserialize(data)

Comments

0

Since we're talking about a human editing the file, I assume we're talking about relatively little data.

How about the following skeleton implementation. It simply saves the data as key=value pairs and works with lists, tuples and many other things.

    def save(fname, **kwargs):
      f = open(fname, "wt")
      for k, v in kwargs.items():
        print >>f, "%s=%s" % (k, repr(v))
      f.close()

    def load(fname):
      ret = {}
      for line in open(fname, "rt"):
        k, v = line.strip().split("=", 1)
        ret[k] = eval(v)
      return ret

    x = [1, 2, 3]
    y = [2.0, 1e15, -10.3]
    save("data.txt", x=x, y=y)
    d = load("data.txt")
    print d["x"]
    print d["y"]

2 Comments

I don't like the eval thing, it could bring dstrange things by allowing the user executing arbitrary code
Assuming it's simple stuff, ast.literal_eval gets round that. I prefer the json approach, though.
0

As I commented in the accepted answer, using numpy this can be done with a simple one-liner:

Assuming you have numpy imported as np (which is common practice),

np.savetxt('xy.txt', np.array([x, y]).T, fmt="%.3f", header="x   y")

will save the data in the (optional) format and

x, y = np.loadtxt('xy.txt', unpack=True)

will load it.

The file xy.txt will then look like:

# x   y
1.000 1.000
1.500 2.250
2.000 4.000
2.500 6.250
3.000 9.000

Note that the format string fmt=... is optional, but if the goal is human-readability it may prove quite useful. If used, it is specified using the usual printf-like codes (In my example: floating-point number with 3 decimals).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.