easy save/load of data in python

Question

What is the easiest way to save and load data in python, preferably in a human-readable output format?

The data I am saving/loading consists of two vectors of floats. Ideally, these vectors would be named in the file (e.g. X and Y).

My current save() and load() functions use file.readline(), file.write() and string-to-float conversion. There must be something better.

Jamie Rumbelow · Accepted Answer · 2010-12-15 13:14:03Z

The most simple way to get a human-readable output is by using a serialisation format such a JSON. Python contains a json library you can use to serialise data to and from a string. Like pickle, you can use this with an IO object to write it to a file.

import json

file = open('/usr/data/application/json-dump.json', 'w+')
data = { "x": 12153535.232321, "y": 35234531.232322 }

json.dump(data, file)

If you want to get a simple string back instead of dumping it to a file, you can use json.dumps() instead:

import json
print json.dumps({ "x": 12153535.232321, "y": 35234531.232322 })

Reading back from a file is just as easy:

import json

file = open('/usr/data/application/json-dump.json', 'r')
print json.load(file)

The json library is full-featured, so I'd recommend checking out the documentation to see what sorts of things you can do with it.

wjandrea · Accepted Answer · 2022-07-06 16:37:29Z

9

There are several options -- I don't exactly know what you like. If the two vectors have the same length, you could use numpy.savetxt() to save your vectors, say x and y, as columns:

# saving:
f = open("data", "w")
f.write("# x y\n")        # column names
numpy.savetxt(f, numpy.array([x, y]).T)
# loading:
x, y = numpy.loadtxt("data", unpack=True)

If you are dealing with larger vectors of floats, you should probably use NumPy anyway.

edited Jul 6, 2022 at 16:37

wjandrea

34k10 gold badges69 silver badges106 bronze badges

answered Dec 15, 2010 at 13:21

Sven Marnach

608k123 gold badges968 silver badges866 bronze badges

5 Comments

nos Over a year ago

it seems to me numpy.savetxt takes a string as its first argument.

Sven Marnach Over a year ago

@nos: It takes a file name or a file object. Might be that old versions of NumPy only accepted a string. I pass a file object to be able to write the header line before.

nos Over a year ago

it saves if i use f = open("data", "w+")

Dalker Over a year ago

You can add the header line directly in numpy.savetxt thanks to an optional argument, as in numpy.savetxt("data", numpy.array([x, y]).T, header="x y") (the comment symbol and newline are automatically added to the header line). Thanks to that, you don't need to do any extra file handling any more and it's all in one line.

Sven Marnach Over a year ago

@Dalker Yep, these days the code can be simplified, but savetxt() didn't have this option at the time.

Lennart Regebro · Accepted Answer · 2010-12-15 13:17:45Z

8

If it should be human-readable, I'd also go with JSON. Unless you need to exchange it with enterprise-type people, they like XML better. :-)
If it should be human editable and isn't too complex, I'd probably go with some sort of INI-like format, like for example configparser.
If it is complex, and doesn't need to be exchanged, I'd go with just pickling the data, unless it's very complex, in which case I'd use ZODB.
If it's a LOT of data, and needs to be exchanged, I'd use SQL.

That pretty much covers it, I think.

answered Dec 15, 2010 at 13:17

Lennart Regebro

173k45 gold badges230 silver badges254 bronze badges

2 Comments

dheerosaur Over a year ago

Does CSV fit any of these requirements?

Lennart Regebro Over a year ago

CSV is a good format if your data is very simple (ie tabular), and you want to be able to import it in a spreadsheet. Otherwise it's too undefined, not easily readable or editable, can't handle complex data, and is a pain if the dataset is big.

Mark Byers · Accepted Answer · 2010-12-15 13:04:04Z

2

A simple serialization format that is easy for both humans to computers read is JSON.

You can use the json Python module.

answered Dec 15, 2010 at 13:04

Mark Byers

844k202 gold badges1.6k silver badges1.5k bronze badges

Comments

Koke Cacao · Accepted Answer · 2019-05-22 22:30:39Z

Here is an example of Encoder until you probably want to write for Body class:

# add this to your code
class BodyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        if hasattr(obj, '__jsonencode__'):
            return obj.__jsonencode__()
        if isinstance(obj, set):
            return list(obj)
        return obj.__dict__

    # Here you construct your way to dump your data for each instance
    # you need to customize this function
    def deserialize(data):
        bodies = [Body(d["name"],d["mass"],np.array(d["p"]),np.array(d["v"])) for d in data["bodies"]]
        axis_range = data["axis_range"]
        timescale = data["timescale"]
        return bodies, axis_range, timescale

    # Here you construct your way to load your data for each instance
    # you need to customize this function
    def serialize(data):
        file = open(FILE_NAME, 'w+')
        json.dump(data, file, cls=BodyEncoder, indent=4)
        print("Dumping Parameters of the Latest Run")
        print(json.dumps(data, cls=BodyEncoder, indent=4))

Here is an example of the class I want to serialize:

class Body(object):
    # you do not need to change your class structure
    def __init__(self, name, mass, p, v=(0.0, 0.0, 0.0)):
        # init variables like normal
        self.name = name
        self.mass = mass
        self.p = p
        self.v = v
        self.f = np.array([0.0, 0.0, 0.0])

    def attraction(self, other):
        # not important functions that I wrote...

Here is how to serialize:

# you need to customize this function
def serialize_everything():
    bodies, axis_range, timescale = generate_data_to_serialize()

    data = {"bodies": bodies, "axis_range": axis_range, "timescale": timescale}
    BodyEncoder.serialize(data)

Here is how to dump:

def dump_everything():
    data = json.loads(open(FILE_NAME, "r").read())
    return BodyEncoder.deserialize(data)

NPE · Accepted Answer · 2010-12-15 13:22:28Z

0

Since we're talking about a human editing the file, I assume we're talking about relatively little data.

How about the following skeleton implementation. It simply saves the data as key=value pairs and works with lists, tuples and many other things.

    def save(fname, **kwargs):
      f = open(fname, "wt")
      for k, v in kwargs.items():
        print >>f, "%s=%s" % (k, repr(v))
      f.close()

    def load(fname):
      ret = {}
      for line in open(fname, "rt"):
        k, v = line.strip().split("=", 1)
        ret[k] = eval(v)
      return ret

    x = [1, 2, 3]
    y = [2.0, 1e15, -10.3]
    save("data.txt", x=x, y=y)
    d = load("data.txt")
    print d["x"]
    print d["y"]

answered Dec 15, 2010 at 13:22

NPE

503k114 gold badges970 silver badges1k bronze badges

2 Comments

Xavier Combelle Over a year ago

I don't like the eval thing, it could bring dstrange things by allowing the user executing arbitrary code

Thomas K Over a year ago

Assuming it's simple stuff, ast.literal_eval gets round that. I prefer the json approach, though.

Dalker · Accepted Answer · 2016-11-06 07:25:31Z

0

As I commented in the accepted answer, using numpy this can be done with a simple one-liner:

Assuming you have numpy imported as np (which is common practice),

np.savetxt('xy.txt', np.array([x, y]).T, fmt="%.3f", header="x   y")

will save the data in the (optional) format and

x, y = np.loadtxt('xy.txt', unpack=True)

will load it.

The file xy.txt will then look like:

# x   y
1.000 1.000
1.500 2.250
2.000 4.000
2.500 6.250
3.000 9.000

Note that the format string fmt=... is optional, but if the goal is human-readability it may prove quite useful. If used, it is specified using the usual printf-like codes (In my example: floating-point number with 3 decimals).

answered Nov 6, 2016 at 7:25

Dalker

5387 silver badges18 bronze badges

Collectives™ on Stack Overflow

easy save/load of data in python

7 Answers 7

Comments

5 Comments

2 Comments

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

5 Comments

2 Comments

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related