4

I have a bunch of CSV files which I will be combining to a single CSV file named 'Combined'. For each CSV file, once the data is appended to the 'Combined' file, I want to insert a fresh column before column 1 in 'Combined' and insert the name of the CSV file from which data was copied in that iteration. Is there any way of doing this in Python?

4
  • what is the format of the csv file you are appending? Are you appending more columns? or its just new data being appended to existing columns? Commented Jul 20, 2017 at 10:19
  • are the formats of the csv files the same as the others? Commented Jul 20, 2017 at 15:16
  • @userXktape: The format of the CSV file is .LOG. I am not appending more columns. I just want to insert the file name in the first column and append whatever is there in the file directly below the existing data. Commented Jul 21, 2017 at 7:10
  • @MattR: yes, the format of the files is the same as the others. Commented Jul 21, 2017 at 7:10

1 Answer 1

4

This can be done as follows. First open a CSV file for output. Now use Python's glob library to list you all of the CSV files in a folder. For each row in a CSV file, prefix the filename as the first column entry and then write it to output.csv:

import glob
import csv

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)

    for filename in glob.glob('*.csv'):
        with open(filename, newline='') as f_input:
            csv_input = csv.reader(f_input)

            for row in csv_input:
                row.insert(0, filename)
                csv_output.writerow(row)

So for example, if you had these two CSV files:

num.csv

1,2,3,4,5
1,2,3,4,5
1,2,3,4,5

letter.csv

a,b,c,d,e,f
a,b,c,d,e,f
a,b,c,d,e,f
a,b,c,d,e,f

It would create the following output.csv file:

letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
num.csv,1,2,3,4,5
num.csv,1,2,3,4,5
num.csv,1,2,3,4,5

This assumes you are using Python 3.x.

Sign up to request clarification or add additional context in comments.

4 Comments

Amazing Martin Evans. Thanks a ton. Am new to python and had real trouble figuring this out. A small change to my question: These 'csv files' that have to be combined are actually .LOG files inside zipped folders. I know how to unzip them and all of that but I still get an error as follows: iterator should return strings, not bytes (did you open the file in text mode?) P.S.: I am using Python 3.x.
In Python 3.x you need to open the files slightly differently. I have updated the script accordingly. Assuming your log files are in the same format, change to using *.log
The code threw up this error: Argument 'newline' not supported in binary mode. I believe that this is caused due to the fact that 'r' defaults to 'rb'. So, I just changed 'r' to 'rt' and it works perfectly. Thanks a ton!
Done. Wonderful interacting with you on Stackoverflow. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.