I have a bunch of CSV files which I will be combining to a single CSV file named 'Combined'. For each CSV file, once the data is appended to the 'Combined' file, I want to insert a fresh column before column 1 in 'Combined' and insert the name of the CSV file from which data was copied in that iteration. Is there any way of doing this in Python?
-
what is the format of the csv file you are appending? Are you appending more columns? or its just new data being appended to existing columns?userXktape– userXktape2017-07-20 10:19:56 +00:00Commented Jul 20, 2017 at 10:19
-
are the formats of the csv files the same as the others?MattR– MattR2017-07-20 15:16:08 +00:00Commented Jul 20, 2017 at 15:16
-
@userXktape: The format of the CSV file is .LOG. I am not appending more columns. I just want to insert the file name in the first column and append whatever is there in the file directly below the existing data.Gautham Kanthasamy– Gautham Kanthasamy2017-07-21 07:10:11 +00:00Commented Jul 21, 2017 at 7:10
-
@MattR: yes, the format of the files is the same as the others.Gautham Kanthasamy– Gautham Kanthasamy2017-07-21 07:10:14 +00:00Commented Jul 21, 2017 at 7:10
Add a comment
|
1 Answer
This can be done as follows. First open a CSV file for output. Now use Python's glob library to list you all of the CSV files in a folder. For each row in a CSV file, prefix the filename as the first column entry and then write it to output.csv:
import glob
import csv
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
for filename in glob.glob('*.csv'):
with open(filename, newline='') as f_input:
csv_input = csv.reader(f_input)
for row in csv_input:
row.insert(0, filename)
csv_output.writerow(row)
So for example, if you had these two CSV files:
num.csv
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
letter.csv
a,b,c,d,e,f
a,b,c,d,e,f
a,b,c,d,e,f
a,b,c,d,e,f
It would create the following output.csv file:
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
num.csv,1,2,3,4,5
num.csv,1,2,3,4,5
num.csv,1,2,3,4,5
This assumes you are using Python 3.x.
4 Comments
Gautham Kanthasamy
Amazing Martin Evans. Thanks a ton. Am new to python and had real trouble figuring this out. A small change to my question: These 'csv files' that have to be combined are actually .LOG files inside zipped folders. I know how to unzip them and all of that but I still get an error as follows: iterator should return strings, not bytes (did you open the file in text mode?) P.S.: I am using Python 3.x.
Martin Evans
In Python 3.x you need to open the files slightly differently. I have updated the script accordingly. Assuming your log files are in the same format, change to using
*.logGautham Kanthasamy
The code threw up this error: Argument 'newline' not supported in binary mode. I believe that this is caused due to the fact that 'r' defaults to 'rb'. So, I just changed 'r' to 'rt' and it works perfectly. Thanks a ton!
Gautham Kanthasamy
Done. Wonderful interacting with you on Stackoverflow. :)