0

I realize this question has been asked a million times and there is a lot of documentation on it. However, I am unable to output the results in the correct format.

The below code was adopted from: Replacing empty csv column values with a zero

# Save below script as RepEmptyCells.py 
# Add #!/usr/bin/python to script 
# Make executable by chmod +x prior to running the script on desired .csv file 

# Below code will look through your .csv file and replace empty spaces with 0s
# This can be particularly useful for genetic distance matrices 

import csv
import sys

reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
    print(','.join(int(x) for x in row))

Currently, to get the correct output .csv file [i.e. in correct format] one can run the following command in bash:

 #After making the script executable        
./RepEmptyCells.py input.csv > output.csv # this produces the correct output 

I've tried to use csv.writer function to produce the correctly formatted output.csv file (similar to ./RepEmptyCells.py input.csv > output.csv) without much luck.

I'd like to learn how to add this last part to the code to automate the process without having to do it in bash.

What I have tried:

f = open(output2.csv, 'w') 

import csv
import sys

reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
    for i, x in enumerate(row):
                if len(x)< 1:
                         x = row[i] = 0
    f.write(','.join(int(x) for x in row)) 

f.close() 

When looking at the raw files from this code and the one before, they look the same.

However, when I open them in either excel or iNumbers the latter (i.e. output2.csv) shows only a single row of the data.

Its important that both output.csv and output2.csv can be opened in excel.

3 Answers 3

3

2 options:

  1. Just do a f.write('\n') after your current f.write statement.

  2. Use csv.writer. You mention it but it isn't in your code.

    writer = csv.writer(f)
    ...
    writer.writerow([int(x) for x in row])  # Note difference in parameter format
    
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. That did it! So you just had to add new line ('/n')! 1) works. 2) still doesn't, but that's okay.
Beware, I'm surprised 1) works, as on Unix '\n' will translate to LF, while I was pretty sure Excel will only accept csv files when they end on CRLF. In fact, this is a feature of the CSV format, a single LF denotes a line break inside a cell. That's why you open the files as 'rb' for Python 2 and newline='' for Python 3, because the csv writer handles this specific aspect and would be disturbed by the default newline abstraction of Python.
1

An humble proposition

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv
import sys

# Use with statement to properly close files
# Use newline='' which is the right option for Python 3.x
with open(sys.argv[1], 'r', newline='') as fin, open(sys.argv[2], 'w', newline='') as fout:
    reader = csv.reader(fin)
    # You may need to redefine the dialect for some version of Excel that 
    # split cells on semicolons (for _Comma_ Separated Values, yes...)
    writer = csv.writer(fout, dialect="excel")
    for row in reader:
        # Write as reading, let the OS do the caching alone
        # Process the data as it comes in a generator, checking all cells
        # in a row. If cell is empty, the or will return "0"
        # Keep strings all the time: if it's not an int it would fail
        # Converting to int will force the writer to convert it back to str
        # anwway, and Excel doesn't make any difference when loading.
        writer.writerow( cell or "0" for cell in row )

Sample in.csv

1,2,3,,4,5,6,
7,,8,,9,,10

Output out.csv

1,2,3,0,4,5,6,0
7,0,8,0,9,0,10

Comments

0
import csv
import sys

with open(sys.argv[1], 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print row.replace(' ', '0')

and I don't understand your need for using the shell and redirecting. a csv writer is just:

with open('output.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(rows)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.