0

My last question was considered a duplicate, but I haven't found a question remotely similar to what I am asking, so I will rephrase:

I have a csv file, four columns, and about 26,000 rows.

The data is as follows for every row:

Firstname,, Lastname,, ID,, Address 

In the last column, the address column, the addresses are formatted as follows:

1234 Streetname Dr.
Timbuktu, AK 32456
United States

My goal is only to remove the country name, from every row that contains it (not all rows do), preserving the rest of the address, and write this back to the file. I want all the other data to remain as it was. Basically: any instance of...say... the substring "United States" and replace it with a blank space.

The code I presently have is as follows:

import csv


with open('file.csv', 'rt') as rf:
    reader = csv.reader(rf, delimiter=',')
    for row in reader:
#print(row[3] + "\n")    # this works
        usa = "United States"
        row1 = row[0]
        row2 = row[1]
        row3 = row[2]

        if usa in row[3]:
            newrow = row[3].replace(usa, " ")
            #print(newrow + "\n")
with open('file.csv', 'w') as wf:
    writer = csv.writer(wf)    
    writer.writerows(row1 + row2 + row3 + newrow)

It is presently deleting the CSV file nearly clean. Some strange single chars are left over in a few rows, only in the first column.

Can someone help point me to the snag? Thanks.

5
  • Does the country name always come in 3rd line? Commented Aug 20, 2019 at 20:14
  • @ParijatBhatt Yeah. It's always at the end of the address if it is there. But sometimes it does not exist. Commented Aug 20, 2019 at 20:16
  • Is it always united states ? Commented Aug 20, 2019 at 20:16
  • @ParijatBhatt Not always, no Commented Aug 20, 2019 at 20:19
  • You forgot to write back into the file at the end of the for. Commented Aug 20, 2019 at 21:15

3 Answers 3

1

Try this. You will need to obtain a list of possible country names

df = pd.read_csv('data.csv')
country_names = some_list_containing_all_country_names 
df['address'] = df['address'].apply(lambda x: x.split('\n'))
df['address'] = df['address'].apply(lambda x: "\n".join(x[:-1]) if x[-1].lower() in country_names else "\n".join(x))
df.to_csv('data.csv',index=False)

Sign up to request clarification or add additional context in comments.

2 Comments

Is this for use with pandas?
Yes. You could install it. It's easy to use.
0

The snag is that you overwrite all your information in the first loop with the final value of row1, row2, and row3, then write the contents of that to the file. You need to bring the writing operation into the loop.

import csv

usa = 'United States'

with open('a.csv', 'rt') as rf:
    reader = csv.reader(rf, delimiter=',')
    with open('b.csv', 'w') as wf:
        writer = csv.writer(wf)    
        for row in reader:
            if usa in row[3]:
                row[3] = row[3].replace(usa, ' ')
            writer.writerow(row)

Edit: cleaned up slightly

2 Comments

The output will be the same in both files. You are writing row without modification.
No. I am writing row WITH modification ( if usa in row[3], row[3] = " "). And yes, I checked on a sample file. Not same output.
0

Python is not the best tool to do this job. You can do this easier using shell commands:

Windows (Powershell): (cat myFile.csv) -replace "United States" > output.csv
Linux: sed 's/United States//' myFile.csv > output.csv

---------------------------------------------------

Edit: If you have a (long) list of countries that you want to delete:

Windows(Powershell):

$countries="United States","France","Italy";
cp myFile.csv output.csv; foreach($country in $countries){(cat output.csv) -replace $country > tmp; cp tmp output.csv; rm tmp}

Linux:

declare -a countries=("United states" "France" "Italy");
cp myFile.csv output.csv; for country in "${countries[@]}"; do sed -i "s/$country//" output.csv; done

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.