1

I have a list named university_towns.txt which has a list as follows:

     ['Alabama[edit]\n',
        'Auburn (Auburn University)[1]\n',
        'Florence (University of North Alabama)\n',
        'Jacksonville (Jacksonville State University)[2]\n',
        'Livingston (University of West Alabama)[2]\n',
        'Montevallo (University of Montevallo)[2]\n',
        'Troy (Troy University)[2]\n',
        'Tuscaloosa (University of Alabama, Stillman College, Shelton State)[3]      [4]\n',
        'Tuskegee (Tuskegee University)[5]\n']

I want to clean this text file such that all the characters in parentheses are replaced by '' . So, I want my text file to look like:

['Alabama',
 'Auburn',
 'Florence',
 'Jacksonville',
 'Livingston',
 'Montevallo',
 'Troy',
 'Tuscaloosa,
 'Tuskegee',
 'Alaska',
 'Fairbanks',
 'Arizonan',
 'Flagstaff',
 'Tempe',
 'Tucson']

I am trying to do this as follows:

import pandas as pd
import numpy as np
file = open('university_towns.txt','r')
lines = files.readlines()
for i in range(0,len(file)):
    lines[i] = lines[i].replace('[edit]','')
    lines[i] = lines[i].replace(r' \(.*\)','')

With this, I am able to remove '[edit]' but I am not able to remove the string in '( )'.

1
  • Saurav Agarwal, it looks like your last edit rolled back some good edits from someone else. Please re-apply your own edits, but ensure you refresh your screen first, so that the prior edits are preserved. I have rolled back. Thanks. Commented Dec 22, 2016 at 18:55

4 Answers 4

1

You may use regex along with list comprehension expression as:

import re

new_list = [re.match('\w+', i).group(0) for i in my_list]
#       match for word ^             ^ returns first word 

where my_list is the original list mentioned in question. Final value hold by new_list will be:

['Alabama', 
 'Auburn', 
 'Florence', 
 'Jacksonville', 
 'Livingston', 
 'Montevallo', 
 'Troy', 
 'Tuscaloosa', 
 'Tuskegee']
Sign up to request clarification or add additional context in comments.

2 Comments

This is just giving output as : [ ]
Note this would fail for a city like Grand Rapids due to the whitespace.
1

The replace method on a string replaces an actual substring. You need to use regex:

import re
#...
line[i] = re.sub(r' (.*)', '', line[i])

Comments

0

A simple regex should do the trick.

import re
output = [re.split(r'[[(]', s)[0].strip() for s in your_list]

1 Comment

This is not showing the expected output, It is showing output as : [ ] which is an empty list. @mVChr
0

You can use re.sub instead of replace

import re
# your code here
lines[i] = re.sub(r' \(.*\)','', lines[i])

2 Comments

That wasn't all the code... just pointing out replace takes a substring but you need to use regex. What error are you getting?
@SauravAgarwal, how doctorlove already wrote, it's not all the code, it's only example of your line 'lines[i] = lines[i].replace(r' (.*)','')' to change

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.