0

I want to go through large CSV files and if there is missing data I want to remove that row completely, This is only row specific so if there is a cell that = 0 or has no value then I want to remove the entire row. I want this to happen for all the columns so if any column has a black cell it should delete the row, and return the corrected data in a corrected csv.

import csv

with open('data.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        print(row)
        if not row[0]:
             print("12")

This is what I found and tried but it doesnt not seem to be working and I dont have any ideas about how to aproach this problem, help please?

Thanks!

3
  • Do you know how many columns are expected in the CSV. This is crucial information in order for this to succeed Commented Jun 15, 2022 at 13:24
  • Its not the same but say 21 because thats the example data I am working with Commented Jun 15, 2022 at 13:34
  • 1
    Does this answer your question? delete line, of a .csv, when certain column has no value python Commented Jun 17, 2022 at 13:45

3 Answers 3

0

Due to the way in which CSV reader presents rows of data, you need to know how many columns there are in the original CSV file. For example, if the CSV file content looks like this:

1,2
3,
4

Then the lists return by iterating over the reader would look like this:

['1','2']
['3','']
['4']

As you can see, the third row only has one column whereas the first and second rows have 2 columns albeit that one is (effectively) empty.

This function allows you to either specify the number of columns (if you know them before hand) or allow the function to figure it out. If not specified then it is assumed that the number of columns is the greatest number of columns found in any row.

So...

import csv

DELIMITER = ','

def valid_column(col):
    try:
        return float(col) != 0
    except ValueError:
        pass
    return len(col.strip()) > 0


def fix_csv(input_file, output_file, cols=0):
    if cols == 0:
        with open(input_file, newline='') as indata:
            cols = max(len(row) for row in csv.reader(indata, delimiter=DELIMITER))
    with open(input_file, newline='') as indata, open(output_file, 'w', newline='') as outdata:
        writer = csv.writer(outdata, delimiter=DELIMITER)
        for row in csv.reader(indata, delimiter=DELIMITER):
            if len(row) == cols:
                if all(valid_column(col) for col in row):
                    writer.writerow(row)

fix_csv('original.csv', 'fixed.csv')
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot, I realised I was looking at this problem wrong bz of you, and this code really pointed me to the right direction. Thanks again!
I have tried this code and it seem to be working fine, but when I print it on a csv, it prints, however the next time I try to run the code it doesnt print on the csv and after each time I am manually deleting the result, if I wanted to it to print to the same file again how would I do that
@Saahas I don't know what "print it on a csv" means
0

maybe like this

import csv

with open('data.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    data=list(csvreader)

data=[x for x in data if '' not in x and '0' not in x]

you can then rewrite the the csv file if you like

6 Comments

If I just wanted to print the corrected rows how? Its printing all the rows even with 0 when I try
Maybe change '0' to 0, if it's int's in your data. Not sure what else you mean
@SuperStew See my answer for examples of potentially malformed rows where this would not work. csv.reader does not convert values - they're always strings
@AlbertWinestein that's why i left '0' in my answer, not sure what op means by print though
@SuperStew problems arise with your code if you have rows in the CSV that look this: 100, 0 That is entirely legal but when csv.reader breaks this down you'll get ['100', ' 0'] Note the space preceding the zero. What you really need to do is to try to convert the string to a number and then check for zero
|
0

Instead of using csv, you should use Pandas module, something like this.

import pandas as pd

df = pd.read_csv('file.csv')
print(df)

index = 1 #index of the row that you want to remove
df = df.drop(index)
print(df) 

df.to_csv('file.csv')

1 Comment

I haven't worked with Pandas before how would I find the empty cell and then the Index of the row

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.