1

I am writing on the fly the following data in a csv:

name first file parsed                    
STEP ID  ELEMENT_ID  Fatigue SW  Fatigue F1  Fatigue F3
Step 10  10000       1.30E-07    1.51E-06    2.15E-06

when I finish to parse the first file, and start the second I would like to add more columns as follows:

name first file parsed                                   name first file parsed
STEP ID  ELEMENT_ID  Fatigue SW  Fatigue F1  Fatigue F3  Fatigue SW  Fatigue F1  Fatigue F3
Step 10  10000       1.30E-07    1.51E-06    2.15E-06    1.30E-07    1.51E-06    2.15E-06

The files I am reading in are massive 2Gb, so I cannot afford to create lists, I need to write as I am parsing.

Any suggestions?

1
  • You cannot add columns to an existing CSV file; you'll have to rewrite the whole file, I'm afraid. Commented Nov 26, 2013 at 18:22

2 Answers 2

4

You cannot add columns to an existing CSV file; you'll have to rewrite the whole file, I'm afraid.

You can use the following context manager to make replacing a file a little easier:

from contextlib import contextmanager
import io
import os


@contextmanager
def inplace(filename, mode='r', buffering=-1, encoding=None, errors=None,
            newline=None, backup_extension=None):
    """Allow for a file to be replaced with new content.

    yields a tuple of (readable, writable) file objects, where writable
    replaces readable.

    If an exception occurs, the old file is restored, removing the
    written data.

    mode should *not* use 'w', 'a' or '+'; only read-only-modes are supported.

    """

    # move existing file to backup, create new file with same permissions
    # borrowed extensively from the fileinput module
    if set(mode) & set('wa+'):
        raise ValueError('Only read-only file modes can be used')

    backupfilename = filename + (backup_extension or os.extsep + 'bak')
    try:
        os.unlink(backupfilename)
    except os.error:
        pass
    os.rename(filename, backupfilename)
    readable = io.open(backupfilename, mode, buffering=buffering,
                       encoding=encoding, errors=errors, newline=newline)
    try:
        perm = os.fstat(readable.fileno()).st_mode
    except OSError:
        writable = open(filename, 'w' + mode.replace('r', ''),
                        buffering=buffering, encoding=encoding, errors=errors,
                        newline=newline)
    else:
        os_mode = os.O_CREAT | os.O_WRONLY | os.O_TRUNC
        if hasattr(os, 'O_BINARY'):
            os_mode |= os.O_BINARY
        fd = os.open(filename, os_mode, perm)
        writable = io.open(fd, "w" + mode.replace('r', ''), buffering=buffering,
                           encoding=encoding, errors=errors, newline=newline)
        try:
            if hasattr(os, 'chmod'):
                os.chmod(filename, perm)
        except OSError:
            pass
    try:
        yield readable, writable
    except Exception:
        # move backup back
        try:
            os.unlink(filename)
        except os.error:
            pass
        os.rename(backupfilename, filename)
        raise
    finally:
        readable.close()
        writable.close()
        try:
            os.unlink(backupfilename)
        except os.error:
            pass

Use this with the csv module to add columns:

with inplace(csvfilename, 'rb') as (infh, outfh):
    reader = csv.reader(infh)
    writer = csv.writer(outfh)

    for row in reader:
        row += ['new', 'column']
        writer.writerow(row)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for input, your solution is very elegant, but I am an engineer, so I will use a horrible merge.
The contextmanager is posted as a blog post, expanding support a little: zopatista.com/python/2013/11/26/inplace-file-rewriting
0
  1. Define a class that represents the original row of data (such as OriginalData).
  2. Define a second class that derives from the first class, and includes properties for each of the new columns (such as NewData).
  3. Create a constructor on NewData that takes an OriginalData as an argument. Have it copy the data from OriginalData into itself.
  4. Overload ToString() on NewData so that it returns a string in the format that you want it to appear in the target file.
  5. While you're iterating over the lines, read them into an OriginalData instance.
  6. Once the originalData instance is loaded, copy the data into a NewData instance, and populate the new properties to include your data.
  7. Write the data from NewData to the target file by calling NewData's ToString() method.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.