Persisting data between script execution

Question

I was wondering if python has a simple method for caching a sequence of values, where the sequence can be updated each time the script is run. For example, let's say I have a list of tuples where each tuple is a datetime and a float. The datetime represents the time a speed was recorded by an anemometer and the float is the speed recorded. When I run my script, new values should be added to my list and remember the next time I run the script. When I first started programming, the way I solved this was using a pickle, as follows:

import os
import pickle
import datetime

db_path = "speeds.p"

# get all our previous speeds
speeds = []
if os.path.exists(db_path):
    with open(db_path, "rb") as f:
        speeds = pickle.load(f)


def data_from_endpoint():
    data = (
        (datetime.datetime(2022, 10, 22, 21, 15), 13),
        (datetime.datetime(2022, 10, 22, 21, 30), 24),
        (datetime.datetime(2022, 10, 22, 21, 45), 37)
    )

    for i in data:
        yield i


try:
    # add new speeds
    for t, v in data_from_endpoint():
        if len(speeds) == 0 or t > speeds[-1][0]:
            print(f"Adding {t}, {v}")
            speeds.append((t, v))
finally:
    # save all speeds
    with open(db_path, "wb") as f:
        pickle.dump(speeds, f)

print(f"Number of values: {len(speeds)}")

The way I would solve this now is to use a sqlite database. Both solutions involve a lot of code for something so simple and I'm wondering if python has a simpler way of doing this.

You use ickle, and it's 2 lines to read and 2 lines to write. I would say that is a good solution. Personally, I prefer e.g. json, so I can human-read the data in the file. But of course, it depends on how much data you handle. With pickle and json, you write all the data each time, and load everything into memory, while sqlite is a bit smarter. — Dr. V
– Dr. V, Commented Oct 22, 2022 at 18:46
If you're looking to avoid the overhead of loading the whole pickle file in and writing the whole thing out, you could always try shelve, which lets you add one object at a time. — Nick ODell
– Nick ODell, Commented Oct 22, 2022 at 18:49
I went the opposite direction. Started of with enterprise databases SQL Server, Oracle etc. and I just use pickle for most things these days. I'd throw yaml in the mix:n ot quite as simple as pickle but the files are easily human readable and editable which is a positive. I'm an old dog though: no new tricks from me. — John M.
– John M., Commented Oct 22, 2022 at 18:49
@JohnM., ...I disagree that YAML is human-editable. Much less so than simpler formats like TOML, at least; see arp242.net/yaml-config.html for an essay on the subject. The YAML spec is even longer than the XML spec; it's full of carve-outs, exceptions, and duplicative ways to do the same thing. (See also stackoverflow.com/questions/3790454/…) — Charles Duffy
– Charles Duffy, Commented Oct 22, 2022 at 20:13

Dronakuul · Accepted Answer · 2022-10-22 19:26:10Z

1

You can append to pickle files. I don't know if that is simple enough:

import pickle
import datetime

db_path = "speeds.p"

def data_from_endpoint():
    data = (
        (datetime.datetime(2022, 10, 22, 21, 15), 13),
        (datetime.datetime(2022, 10, 22, 21, 30), 24),
        (datetime.datetime(2022, 10, 22, 21, 45), 37)
    )

    for i in data:
        yield i

# no need to check for the existence of the pickle file with append mode
with open("speeds.pickle", "ab") as f:
    for t, v in data_from_endpoint():
        speeds = pickle.dump({t:v}, f)

# run several times and see how the list gets longer
objs=[]
with open("speeds.pickle", "rb") as f:
    while True:
        try:
            o = pickle.load(f)
        except EOFError:
            break
        objs.append(o)

print(objs)

answered Oct 22, 2022 at 19:26

Dronakuul

1902 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Baz · Accepted Answer · 2022-10-22 19:50:48Z

1

Here is a solution using shelve as suggested by @Nick ODell. Not yet sure if I should be using str(t) below.

import shelve
import datetime

def data_from_endpoint():
    data = (
        (datetime.datetime(2022, 10, 22, 21, 15), 13),
        (datetime.datetime(2022, 10, 22, 21, 30), 24),
        (datetime.datetime(2022, 10, 22, 21, 45), 37),
    )

    for i in data:
        yield i


with shelve.open("speeds.db",writeback=True) as db:

    # add new speeds
    for t, v in data_from_endpoint():
        t = str(t)
        print(t)
        if t not in db:
            print(f"Adding {t}, {v}")
            db[t] = v

    print(f"Number of values: {len(db)}")

edited Oct 22, 2022 at 19:50

answered Oct 22, 2022 at 19:31

Baz

13.3k40 gold badges153 silver badges279 bronze badges

Comments

Roland Smith · Accepted Answer · 2022-10-22 20:22:49Z

There is no real "standard" way for Python to store data.

Consider that:

Your program needs a location for its file.
It needs write permissions in that location.
If you use the current working directory, it will fail to find the file when it is started from another location.
If your program needs to work on multiple platforms, it becomes a lot more complicated.

This is an excerpt of a program that works both on ms-windows and POSIX systems like FreeBSD and Linux.

import os
import re
import json


home = ""
uname = ""


def load_data():
    """
    Load the program's data file.

    It is located in the user's home directory,
    which is platform specific.
    
    Uses the global variable home,
    which is initialized when the program starts.
    """
    try:
        with open(home + os.sep + "resins.json") as rf:
            lines = rf.readlines()
    except (FileNotFoundError, KeyError):
        with open("resins.json") as rf:
            lines = rf.readlines()
    text = "\n".join([ln.strip() for ln in lines])
    try:
        lm = re.search("// Last modified: (.*)", text).groups()[0]
    except AttributeError:
        lm = None
    nocomments = re.sub("^//.*$", "", text, flags=re.MULTILINE)
    return loads(nocomments), lm


if __name__ == "__main__":
    # Platform specific set-up
    if os.name == "nt":
        uname = os.environ["USERNAME"]
        home = os.environ["HOMEDRIVE"] + os.environ["HOMEPATH"]
        if home.endswith(os.sep):
            home = home[:-1]

    elif os.name == "posix":
        uname = os.environ["USER"]
        home = os.environ["HOME"]

    recepies, filedate = load_data()

Collectives™ on Stack Overflow

Persisting data between script execution

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related