0

I am struggling with the csv module. I have a sample CSV file that has 5000 lines (each line contains 7 values 0 or 1) with headers. I want to iterate through file in read mode and append file in write mode with new column values (prediction), but iteration stop after 478th row (like in sample code):

import csv
import random


def input_to_csv():

    prediction = [round(random.uniform(0, 1), 0) for _ in range(1, 5000)]

    combined_set = list(map(str, prediction))

    export_columns = ['COLUMN ' + str(n) for n in range(1, 8)] + ['OUTPUT'] 

    rr = 0
    with open('test.csv', 'r') as input_file:

        csv_input = csv.reader(input_file)
        next(csv_input)

        with open('test.csv', 'w', newline='') as csv_file:

            writer = csv.writer(csv_file)
            writer.writerow(export_columns)

            for row in csv_input:

                rr += 1

        print(rr)

I have checked length of the csv_input file using row_count = sum(1 for _ in input_file) which gave me 5000 lines.

1 Answer 1

2

You're opening the same file twice, once for reading and once for writing.

Because you're getting some data from the file before reopening it (the next() call) it's going to fill a read buffer (buffered reads are the default in Python) and iterate on that fine.

However once it reaches the end of the read buffer it's going to go back to the file and try and get some data, which re-opening the file in "w" mode has truncated. So the reader will get no data, assume it's reached end of file (which is not entirely wrong) and stop.

I expect the code looked to be working as long as you'd stayed below Python's default buffer size (io.DEFAULT_BUFFER_SIZE, that's 8kB on my system).

You should write to a different file than you're reading from. Either move the file before reading from it, or open a completely different file for writing (and possibly move it afterwards).

Sign up to request clarification or add additional context in comments.

5 Comments

That's good approach when you work with small files and memory is not full. My goal is to open file with millions of lines and add 15-20 columns while ML algorithm is learning and gives prediction output after each iteration. When I allocate memory for csv file before algorithms starts its occupant 2GB, so every GB counts ;)
That's a fine answer to the second suggestion, which leaves you with the first: write to a different file than you're reading from. Either move the file before reading from it, or open a completely different file for writing (and possibly move it afterwards). That's what tools like e.g. sed -i do.
An other alternative is to do your storage in something more resilient and flexible e.g. an sqlite db or something like that.
Could you add write to a different file than you're reading from. Either move the file before reading from it, or open a completely different file for writing (and possibly move it afterwards). as answer and I will mark it as complete? That resolved my issue.
Done, I've replaced the last paragraph of the answer by that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.