Problem statement:
I have a script that is called from an external tool and maintains a state over its instances in a set of files. I think that the most practical way to deal with it is to simply serialize the script instances using a single lock with the capability of being acquired (a) when it has not yet been locked, (b) after it has been released and (c) after the process holding it disappeared.
I'm not yet certain whether it is necessary to wake up the next waiting process immediately when an existing process crashes or something, as that is an exceptional condition anyway. But certainly the next action (possibly triggered by a restart) must be able to successfully run.
The script depends on NetworkManager which in turn currently only runs on Linux. Therefore a simple solution is preferred over a cross platform one. On the other hand, a cross platform solution may be useful to a larger number of stackoverflow visitors.
Further discussion:
I found a number of related questions and answers here on stackoverflow but (1) the questions were not as specific as this one and (2) the answers didn't seem to be usable for this case. Especially the part about handling stale locks went mostly unaddressed.
I would like to keep using the context manager API and only libraries common in Linux installation. I guess there's no perfect solution in the standard library nor in any of the common installations, so I think I will need to implement the context manager using some lower level API.
The current code uses the lockfile module which doesn't seem to care about stale locks at all. The script instances aren't expected to share anything except the file system, therefore multiprocessing module based solutions don't seem to apply here. I was thinking about a combination of pidfile and fcntl, but also about a unix socket that could be used for waiting on the other script to finish. I wonder why I can't find a standard context manager based tool for this in Python.
A live version (will change as new patches are accepted) of the script in question:
http://www.nlnetlabs.nl/svn/dnssec-trigger/trunk/dnssec-trigger-script.in
Relevant part of the source code:
def run(self):
with lockfile.FileLock("/var/run/dnssec-trigger/dnssec-trigger"):
log.debug("Running: {}".format(self.method.__name__))
self.method()