Search and replace in a CSV file using Python

Question

My last question was considered a duplicate, but I haven't found a question remotely similar to what I am asking, so I will rephrase:

I have a csv file, four columns, and about 26,000 rows.

The data is as follows for every row:

Firstname,, Lastname,, ID,, Address

In the last column, the address column, the addresses are formatted as follows:

1234 Streetname Dr.
Timbuktu, AK 32456
United States

My goal is only to remove the country name, from every row that contains it (not all rows do), preserving the rest of the address, and write this back to the file. I want all the other data to remain as it was. Basically: any instance of...say... the substring "United States" and replace it with a blank space.

The code I presently have is as follows:

import csv


with open('file.csv', 'rt') as rf:
    reader = csv.reader(rf, delimiter=',')
    for row in reader:
#print(row[3] + "\n")    # this works
        usa = "United States"
        row1 = row[0]
        row2 = row[1]
        row3 = row[2]

        if usa in row[3]:
            newrow = row[3].replace(usa, " ")
            #print(newrow + "\n")
with open('file.csv', 'w') as wf:
    writer = csv.writer(wf)    
    writer.writerows(row1 + row2 + row3 + newrow)

It is presently deleting the CSV file nearly clean. Some strange single chars are left over in a few rows, only in the first column.

Can someone help point me to the snag? Thanks.

@ParijatBhatt Yeah. It's always at the end of the address if it is there. But sometimes it does not exist. — j-grimwood
– j-grimwood, Commented Aug 20, 2019 at 20:16
You forgot to write back into the file at the end of the for. — daniboy000
– daniboy000, Commented Aug 20, 2019 at 21:15

Parijat Bhatt · Accepted Answer · 2019-08-20 20:22:10Z

1

Try this. You will need to obtain a list of possible country names

df = pd.read_csv('data.csv')
country_names = some_list_containing_all_country_names 
df['address'] = df['address'].apply(lambda x: x.split('\n'))
df['address'] = df['address'].apply(lambda x: "\n".join(x[:-1]) if x[-1].lower() in country_names else "\n".join(x))
df.to_csv('data.csv',index=False)

answered Aug 20, 2019 at 20:22

Parijat Bhatt

6744 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

j-grimwood Over a year ago

Is this for use with pandas?

Parijat Bhatt Over a year ago

Yes. You could install it. It's easy to use.

anvoice · Accepted Answer · 2019-08-20 22:01:14Z

0

The snag is that you overwrite all your information in the first loop with the final value of row1, row2, and row3, then write the contents of that to the file. You need to bring the writing operation into the loop.

import csv

usa = 'United States'

with open('a.csv', 'rt') as rf:
    reader = csv.reader(rf, delimiter=',')
    with open('b.csv', 'w') as wf:
        writer = csv.writer(wf)    
        for row in reader:
            if usa in row[3]:
                row[3] = row[3].replace(usa, ' ')
            writer.writerow(row)

Edit: cleaned up slightly

edited Aug 20, 2019 at 22:01

answered Aug 20, 2019 at 20:53

anvoice

3352 silver badges9 bronze badges

2 Comments

daniboy000 Over a year ago

The output will be the same in both files. You are writing row without modification.

anvoice Over a year ago

No. I am writing row WITH modification ( if usa in row[3], row[3] = " "). And yes, I checked on a sample file. Not same output.

LoMaPh · Accepted Answer · 2019-08-20 22:23:22Z

0

Python is not the best tool to do this job. You can do this easier using shell commands:

Windows (Powershell): (cat myFile.csv) -replace "United States" > output.csv
Linux: sed 's/United States//' myFile.csv > output.csv

---------------------------------------------------

Edit: If you have a (long) list of countries that you want to delete:

Windows(Powershell):

$countries="United States","France","Italy";
cp myFile.csv output.csv; foreach($country in $countries){(cat output.csv) -replace $country > tmp; cp tmp output.csv; rm tmp}

Linux:

declare -a countries=("United states" "France" "Italy");
cp myFile.csv output.csv; for country in "${countries[@]}"; do sed -i "s/$country//" output.csv; done

edited Aug 20, 2019 at 22:23

answered Aug 20, 2019 at 21:24

LoMaPh

1,7203 gold badges24 silver badges37 bronze badges

Collectives™ on Stack Overflow

Search and replace in a CSV file using Python

3 Answers 3

2 Comments

2 Comments

---------------------------------------------------

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

---------------------------------------------------

Comments

Your Answer

Sign up or log in

Post as a guest

Related