83

I am using Python to write chunks of text to files in a single operation:

open(file, 'w').write(text)

If the script is interrupted so a file write does not complete I want to have no file rather than a partially complete file. Can this be done?

1

7 Answers 7

130

Write data to a temporary file and when data has been successfully written, rename the file to the correct destination file e.g

with open(tmpFile, 'w') as f:
    f.write(text)
    # make sure that all data is on disk
    # see http://stackoverflow.com/questions/7433057/is-rename-without-fsync-safe
    f.flush()
    os.fsync(f.fileno())    
os.replace(tmpFile, myFile)  # os.rename pre-3.3, but os.rename won't work on Windows

According to doc http://docs.python.org/library/os.html#os.replace

Rename the file or directory src to dst. If dst is a non-empty directory, OSError will be raised. If dst exists and is a file, it will be replaced silently if the user has permission. The operation may fail if src and dst are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement).

Note:

  • It may not be atomic operation if src and dest locations are not on same filesystem

  • os.fsync step may be skipped if performance/responsiveness is more important than the data integrity in cases like power failure, system crash etc

Sign up to request clarification or add additional context in comments.

15 Comments

For completeness, the tempfile module provides an easy, safe way to create temporary files.
And for more completeness: rename is atomic only within same filesystem on POSIX, so the easiest way is to create tmpFile in the directory of myFile.
While os.fsync is necessary if you're worried about the OS shutting down suddenly (such as loss of power or kernel panic) it's overkill for the case where you're just concerned about the process being interrupted.
@AnuragUniyal - whether it hurts or not depends on how often the atomic write is done. os.fsync can be very slow as it has to wait for the kernel to flush its buffers. If someone uses this code to write multiple files, it can definitely cause measurable slow downs.
@J.F.Sebastian note that sqlite add this fsync(opendir(filename)) to ensure that rename is written to disk too. This does not affect atomicity of this modification, only relative order of this operation vs prev/next on a different file.
|
30

A simple snippet that implements atomic writing using Python tempfile.

with open_atomic('test.txt', 'w') as f:
    f.write("huzza")

or even reading and writing to and from the same file:

with open('test.txt', 'r') as src:
    with open_atomic('test.txt', 'w') as dst:
        for line in src:
            dst.write(line)

using two simple context managers

import os
import tempfile as tmp
from contextlib import contextmanager

@contextmanager
def tempfile(suffix='', dir=None):
    """ Context for temporary file.

    Will find a free temporary filename upon entering
    and will try to delete the file on leaving, even in case of an exception.

    Parameters
    ----------
    suffix : string
        optional file suffix
    dir : string
        optional directory to save temporary file in
    """

    tf = tmp.NamedTemporaryFile(delete=False, suffix=suffix, dir=dir)
    tf.file.close()
    try:
        yield tf.name
    finally:
        try:
            os.remove(tf.name)
        except OSError as e:
            if e.errno == 2:
                pass
            else:
                raise

@contextmanager
def open_atomic(filepath, *args, **kwargs):
    """ Open temporary file object that atomically moves to destination upon
    exiting.

    Allows reading and writing to and from the same filename.

    The file will not be moved to destination in case of an exception.

    Parameters
    ----------
    filepath : string
        the file path to be opened
    fsync : bool
        whether to force write the file to disk
    *args : mixed
        Any valid arguments for :code:`open`
    **kwargs : mixed
        Any valid keyword arguments for :code:`open`
    """
    fsync = kwargs.pop('fsync', False)

    with tempfile(dir=os.path.dirname(os.path.abspath(filepath))) as tmppath:
        with open(tmppath, *args, **kwargs) as file:
            try:
                yield file
            finally:
                if fsync:
                    file.flush()
                    os.fsync(file.fileno())
        os.rename(tmppath, filepath)

9 Comments

The temp file needs to be on the same file system as the file to be replaced. This code will not work reliably on systems with multiple file systems. The NamedTemporaryFile invocation needs a dir= paramter.
Thanks for the comment, I've recently changed this snippet to fall back to shutil.move in case of os.rename failing. This allows it to work across FS boundaries.
That appears to work when running it, but shutil.move uses copy2 which is not atomic. And if copy2 wanted to be atomic it would need to create a temporary file in the same file system as the destination file. So, the fix to fall back to shutil.move maskes the problem only. That is why most snippets place the temporary file into the same directory as the target file. Which is also possible using tempfile.NamedTemporaryFile using the dir named argument. As moving over a file in a directory which is not writable doesn’t work anyway that seem to be the simplest and most robust solution.
Correct, I assumed that shutils.move() was non-atomic due to shutils.copy2() and shutils.remove() called in succession. The new implementation (see edit) will now instead create the file in the current directory and also handle exceptions better.
How come this be atomic while reading and writing to same file? In the example above open('test.txt', 'r') as src: is used to read the file contents. Writing in this sense is atomic but reading might not be the same case. For file types like .ini playup with decorators when used with configparser for read operations. Not sure this sample completely justifies the atomicity around reading from same file over 200000 threads. This will throw Too Many Open Files error.
|
19

Since it is very easy to mess up with the details, I recommend using a tiny library for that. The advantage of a library is that it takes care all these nitty-gritty details, and is being reviewed and improved by a community.

One such library is python-atomicwrites by untitaker which even has proper Windows support:

Caveat (as of 2023):

This library is curently unmaintained. Comment from the author:

[...], I thought it'd be a good time to deprecate this package. Python 3 has os.replace and os.rename which probably do well enough of a job for most usecases.

Original recommendation:

From the README:

from atomicwrites import atomic_write

with atomic_write('foo.txt', overwrite=True) as f:
    f.write('Hello world.')
    # "foo.txt" doesn't exist yet.

# Now it does.

Installation via PIP:

pip install atomicwrites

2 Comments

It's now unmaintained.
@AlexandrZarubkin Oh, that's a pity. Is there any alternative that you can recommend?
6

I’m using this code to atomically replace/write a file:

import os
from contextlib import contextmanager

@contextmanager
def atomic_write(filepath, binary=False, fsync=False):
    """ Writeable file object that atomically updates a file (using a temporary file).

    :param filepath: the file path to be opened
    :param binary: whether to open the file in a binary mode instead of textual
    :param fsync: whether to force write the file to disk
    """

    tmppath = filepath + '~'
    while os.path.isfile(tmppath):
        tmppath += '~'
    try:
        with open(tmppath, 'wb' if binary else 'w') as file:
            yield file
            if fsync:
                file.flush()
                os.fsync(file.fileno())
        os.rename(tmppath, filepath)
    finally:
        try:
            os.remove(tmppath)
        except (IOError, OSError):
            pass

Usage:

with atomic_write('path/to/file') as f:
    f.write("allons-y!\n")

It’s based on this recipe.

2 Comments

the while loop is racy it could be that 2 concurrent processes opening the same file. tempfile.NamedTemporaryFile can overcome this.
I think tmppath like this would be better '.{filepath}~{random}' this avoids race conditions if two processes do the same. This does not solve the race condition, but at least you don't get a file with content of two processes.
3

Just link the file after you're done:

with tempfile.NamedTemporaryFile(mode="w") as f:
    f.write(...)
    os.link(f.name, final_filename)

If you want to get fancy:

@contextlib.contextmanager
def open_write_atomic(filename: str, **kwargs):
    kwargs['mode'] = 'w'
    with tempfile.NamedTemporaryFile(**kwargs) as f:
        yield f
        os.link(f.name, filename)

Comments

2

Answers on this page are quite old, there are now libraries that do this for you.

In particular safer is a library designed to help prevent programmer error from corrupting files, socket connections, or generalized streams. It's quite flexible and amongst other things it has the option to use either memory or temporary files, you can even keep the temp files in case of failure.

Their example is just what you want:

# dangerous
with open(filename, 'w') as fp:
    json.dump(data, fp)
    # If an exception is raised, the file is empty or partly written
# safer
with safer.open(filename, 'w') as fp:
    json.dump(data, fp)
    # If an exception is raised, the file is unchanged.

It's in PyPI, just install it using pip install --user safer or get the latest at https://github.com/rec/safer

Comments

-2

Atomic solution for Windows to loop folder and rename files. Tested, atomic to automate, you can increase probability to minimize risk not to event of having same file name. You random library for letter symbols combinations use random.choice method, for digit str(random.random.range(50,999999999,2). You can vary digits range as you want.

import os import random

path = "C:\\Users\\ANTRAS\\Desktop\\NUOTRAUKA\\"

def renamefiles():
    files = os.listdir(path)
    i = 1
    for file in files:
        os.rename(os.path.join(path, file), os.path.join(path, 
                  random.choice('ABCDEFGHIJKL') + str(i) + str(random.randrange(31,9999999,2)) + '.jpg'))
        i = i+1

for x in range(30):
    renamefiles()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.