0

I have 125 data files containing two columns and 21 rows of data and I'd like to import them into a single .csv file (as 125 pairs of columns and only 21 rows). This is what my data files look like:

enter image description here

I am fairly new to python but I have come up with the following code:

import glob
Results = glob.glob('./*.data')
fout='c:/Results/res.csv'
fout=open ("res.csv", 'w')
 for file in Results:
 g = open( file, "r" )
 fout.write(g.read())
 g.close() 
fout.close()

The problem with the above code is that all the data are copied into only two columns with 125*21 rows.

Any help is very much appreciated!

4
  • 2
    This is totally a job for paste. Commented Apr 23, 2012 at 1:08
  • 1
    is there a paste command in python? Commented Apr 23, 2012 at 1:24
  • There is a Python Paste, but that's not what I'm talking about. Commented Apr 23, 2012 at 1:25
  • Is there a way to do it in python rather than the Python Paste, as I am fairly new to Python yet alone the Python Paste. Commented Apr 23, 2012 at 1:28

3 Answers 3

1

This should work:

import glob

files = [open(f) for f in glob.glob('./*.data')] #Make list of open files
fout = open("res.csv", 'w')

for row in range(21):
    for f in files:
        fout.write( f.readline().strip() ) # strip removes trailing newline
        fout.write(',')
    fout.write('\n')

fout.close()

Note that this method will probably fail if you try a large number of files, I believe the default limit in Python is 256.

Sign up to request clarification or add additional context in comments.

3 Comments

Sorry, forgot to include the comma between concatenated lines. Should hopefully be good now
Thank you for the code, but there is a slight problem with the formatting as there are only 125 columns (i.e. the pair of columns are joined together when opened in excel)
Sorry, I fixed that error about 1 minute after I posted it. Try re copy-pasting it if you haven't fixed it already :)
1

You may want to try the python CSV module (http://docs.python.org/library/csv.html), which provides very useful methods for reading and writing CSV files. Since you stated that you want only 21 rows with 250 columns of data, I would suggest creating 21 python lists as your rows and then appending data to each row as you loop through your files.

something like:

import csv

rows = []
for i in range(0,21):
    row  = []
    rows.append(row)

#not sure the structure of your input files or how they are delimited, but for each one, as you have it open and iterate through the rows, you would want to append the values in each row to the end of the corresponding list contained within the rows list.

#then, write each row to the new csv:

writer = csv.writer(open('output.csv', 'wb'), delimiter=',')
for row in rows:
    writer.writerow(row)

1 Comment

Thank you for this.Please see the pic I now included in the question.
1

(Sorry, I cannot add comments, yet.)

[Edited later, the following statement is wrong!!!] "The davesnitty's generating the rows loop can be replaced by rows = [[]] * 21." It is wrong because this would create the list of empty lists, but the empty lists would be a single empty list shared by all elements of the outer list.

My +1 to using the standard csv module. But the file should be always closed -- especially when you open that much of them. Also, there is a bug. The row read from the file via the -- even though you only write the result here. The solution is actually missing. Basically, the row read from the file should be appended to the sublist related to the line number. The line number should be obtained via enumerate(reader) where reader is csv.reader(fin, ...).

[added later] Try the following code, fix the paths for your puprose:

import csv
import glob
import os

datapath = './data'
resultpath = './result'
if not os.path.isdir(resultpath):
   os.makedirs(resultpath)

# Initialize the empty rows. It does not check how many rows are
# in the file.
rows = []

# Read data from the files to the above matrix.
for fname in glob.glob(os.path.join(datapath, '*.data')):
    with open(fname, 'rb') as f:
        reader = csv.reader(f)
        for n, row in enumerate(reader):
            if len(rows) < n+1:
                rows.append([])  # add another row
            rows[n].extend(row)  # append the elements from the file

# Write the data from memory to the result file.
fname = os.path.join(resultpath, 'result.csv')
with open(fname, 'wb') as f:
    writer = csv.writer(f)
    for row in rows:
        writer.writerow(row)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.