0

First off, I do not have pandas framework and am unable to install it. I am hoping that I can solve this problem without pandas.

I am trying to clean my data using python framework, by removing rows that contain empty cells.

this is my code:

import csv
input_file = 'test.csv'
output_file = 'test1.csv'

cols_to_remove =[0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49] 


cols_to_remove = sorted(cols_to_remove, reverse=True) 
row_count = 0 

with open(input_file, "r") as source: #to run and delete column  
   reader = csv.reader(source)  
   with open(output_file, "w") as result:
       writer = csv.writer(result)
       for row in reader:
           row_count += 1
           print('\r{0}'.format(row_count)) # Print rows processed
           for col_index in cols_to_remove:
               del row[col_index]
           writer.writerow(row)
           print(row)

I have tried codes from other similar questions asked, however it prints into an empty file.

3
  • 1
    please, add code snippet, it's difficult to read Commented Mar 14, 2021 at 18:05
  • sorry! was editing it. it's done now Commented Mar 14, 2021 at 18:06
  • added updated solution with any func, try to use it, please) Commented Mar 14, 2021 at 22:07

4 Answers 4

1

Assuming that the empty row is in fact one that looks like this

,,,,,,,,,,,,,, (and many more)

you can do the following:

import csv
input_file = 'test.csv'
output_file = 'test1.csv'

cols_to_remove =[0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49] 


cols_to_remove = sorted(cols_to_remove, reverse=True) 
row_count = 0 

with open(input_file, "r") as source: #to run and delete column  
   reader = csv.reader(source)  
   with open(output_file, "w") as result:
       writer = csv.writer(result)
       for row in reader:
           row_count += 1
           print('\r{0}'.format(row_count)) # Print rows processed
           all_empty = False # new
           for cell in row: # new
               if len(cell) == 0: # new
                   all_empty = True # new
                   break# new
           if all_empty: # new
               continue # new
           for col_index in cols_to_remove:
               del row[col_index]
           writer.writerow(row)
           print(row)
Sign up to request clarification or add additional context in comments.

2 Comments

hi @C Hecht ! what should follow after "break =" as it is empty
Sorry, the = shouldn't have been there. Edited my code to remove it
1

try to skip row if any cell empty like this:

if any(cel is None or cel == '' for cel in row):
   continue

Here's the code:

import csv

input_file = 'hey.txt'
output_file = 'test1.csv'
cols_to_remove = [0, 1, 9, 11, 14, 15, 23, 28, 29, 32, 33, 37, 38]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0

with open(input_file, "r+") as source:  # to run and delete column
    reader = csv.reader(source)
    with open(output_file, "w+") as result:
        writer = csv.writer(result)
        for row in reader:
            row_count += 1
            print('\r{0}'.format(row_count))  # Print rows processed
            if any(cel is None or cel == '' for cel in row):
                continue
            for col_index in cols_to_remove:
                try:
                    del row[col_index]
                except Exception:
                    pass
            writer.writerow(row)
            print(row)

files and cols_to_remove were changed, please, use your own. step of skipping might be moving after row deleting.

2 Comments

Hi @Vova this is the error that I get: SyntaxError: 'continue' not properly in loop
The idea is to put the code that Vova suggested just below your for-loop. continue always interacts with a for-loop and cannot be run by itself. If I understand the question correctly however, this solution would not work since row not an empty array.
1

Checking length of row as someone suggested might not work as empty rows in csv may contain empty Strings ''. So len(row) wouldn't return 0 because it is a list of empty strings.

To simply delete empty rows in csv try adding the following check

skip_row = True
for item in row:
    if item != None and item != '':
        skip_row = False
if not skip_row:
   # process row

Full code would be

import csv
input_file = 'test.csv'
output_file = 'test1.csv'

cols_to_remove = [0,1,9,11,14,15,23,28,29,32,33,37,38,39,41,43,44,45,46,47,48,49] 


cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0

with open(input_file, "r") as source: #to run and delete column
   reader = csv.reader(source)
   with open(output_file, "w", newline='' ) as result:
       writer = csv.writer(result)
       for row in reader:
            row_count += 1
            print('\r{0}'.format(row_count)) # Print rows processed
            skip_row = True
            for item in row:
                if item != None and item != '':
                    skip_row = False
            if not skip_row:
                for col_index in cols_to_remove:
                    del row[col_index]
                writer.writerow(row)
                print(row)

Here we check if each element of a row is None type or an empty string and decide if we should skip that row or not.

Comments

0

you can use this for delete all the row that have a null value,

data = data.dropna()

and this one for delete column,

data = data.drop(['column'], axis=1)

1 Comment

That's pandas code. They're asking for a python-only solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.