How to delete a row in a CSV file if a cell is empty using Python

Question

I want to go through large CSV files and if there is missing data I want to remove that row completely, This is only row specific so if there is a cell that = 0 or has no value then I want to remove the entire row. I want this to happen for all the columns so if any column has a black cell it should delete the row, and return the corrected data in a corrected csv.

import csv

with open('data.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        print(row)
        if not row[0]:
             print("12")

This is what I found and tried but it doesnt not seem to be working and I dont have any ideas about how to aproach this problem, help please?

Thanks!

Do you know how many columns are expected in the CSV. This is crucial information in order for this to succeed — jackal
– jackal, Commented Jun 15, 2022 at 13:24
Its not the same but say 21 because thats the example data I am working with — Saahas
– Saahas, Commented Jun 15, 2022 at 13:34
Does this answer your question? delete line, of a .csv, when certain column has no value python — Maicon Mauricio
– Maicon Mauricio, Commented Jun 17, 2022 at 13:45

jackal · Accepted Answer · 2022-06-15 15:55:47Z

0

Due to the way in which CSV reader presents rows of data, you need to know how many columns there are in the original CSV file. For example, if the CSV file content looks like this:

1,2
3,
4

Then the lists return by iterating over the reader would look like this:

['1','2']
['3','']
['4']

As you can see, the third row only has one column whereas the first and second rows have 2 columns albeit that one is (effectively) empty.

This function allows you to either specify the number of columns (if you know them before hand) or allow the function to figure it out. If not specified then it is assumed that the number of columns is the greatest number of columns found in any row.

So...

import csv

DELIMITER = ','

def valid_column(col):
    try:
        return float(col) != 0
    except ValueError:
        pass
    return len(col.strip()) > 0


def fix_csv(input_file, output_file, cols=0):
    if cols == 0:
        with open(input_file, newline='') as indata:
            cols = max(len(row) for row in csv.reader(indata, delimiter=DELIMITER))
    with open(input_file, newline='') as indata, open(output_file, 'w', newline='') as outdata:
        writer = csv.writer(outdata, delimiter=DELIMITER)
        for row in csv.reader(indata, delimiter=DELIMITER):
            if len(row) == cols:
                if all(valid_column(col) for col in row):
                    writer.writerow(row)

fix_csv('original.csv', 'fixed.csv')

edited Jun 15, 2022 at 15:55

answered Jun 15, 2022 at 13:39

jackal

29.1k3 gold badges9 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Saahas Over a year ago

Thanks a lot, I realised I was looking at this problem wrong bz of you, and this code really pointed me to the right direction. Thanks again!

Saahas Over a year ago

I have tried this code and it seem to be working fine, but when I print it on a csv, it prints, however the next time I try to run the code it doesnt print on the csv and after each time I am manually deleting the result, if I wanted to it to print to the same file again how would I do that

jackal Over a year ago

@Saahas I don't know what "print it on a csv" means

SuperStew · Accepted Answer · 2022-06-15 13:10:09Z

0

maybe like this

import csv

with open('data.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    data=list(csvreader)

data=[x for x in data if '' not in x and '0' not in x]

you can then rewrite the the csv file if you like

answered Jun 15, 2022 at 13:10

SuperStew

3,0742 gold badges18 silver badges29 bronze badges

6 Comments

Saahas Over a year ago

If I just wanted to print the corrected rows how? Its printing all the rows even with 0 when I try

SuperStew Over a year ago

Maybe change '0' to 0, if it's int's in your data. Not sure what else you mean

jackal Over a year ago

@SuperStew See my answer for examples of potentially malformed rows where this would not work. csv.reader does not convert values - they're always strings

SuperStew Over a year ago

@AlbertWinestein that's why i left '0' in my answer, not sure what op means by print though

jackal Over a year ago

@SuperStew problems arise with your code if you have rows in the CSV that look this: 100, 0 That is entirely legal but when csv.reader breaks this down you'll get ['100', ' 0'] Note the space preceding the zero. What you really need to do is to try to convert the string to a number and then check for zero

|

ClassHacker · Accepted Answer · 2022-06-15 13:35:55Z

0

Instead of using csv, you should use Pandas module, something like this.

import pandas as pd

df = pd.read_csv('file.csv')
print(df)

index = 1 #index of the row that you want to remove
df = df.drop(index)
print(df) 

df.to_csv('file.csv')

answered Jun 15, 2022 at 13:35

ClassHacker

3943 silver badges12 bronze badges

1 Comment

Saahas Over a year ago

I haven't worked with Pandas before how would I find the empty cell and then the Index of the row

Collectives™ on Stack Overflow

How to delete a row in a CSV file if a cell is empty using Python

3 Answers 3

3 Comments

6 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related