Regular Expression to remove non alpha numeric characters is not working

Question

Converted a column of a Pandas dataframe to list. Then lowercased all the elements in the list. Now want to keep only alphabets in the elements of the list. I wrote a regular expression for that. The regex is not working.

df_smer_orig = pd.read_csv('sample.csv', engine='python')
df_smer = df_smer_orig['Item'].tolist()
df_smer = [x.lower() for x in df_smer] 

for x in df_smer:
    print(x)
    regex = re.compile('[^a-zA-Z]')
    regex.sub('', x)
    print(x)

print(df_smer)

Partial output of the code which shows the regex did not work:

agarbathi / incense sticks
agarbathi / incense sticks
worcestershire sauce- 295ml
worcestershire sauce- 295ml

Ashu Grover · Accepted Answer · 2019-03-16 13:25:06Z

1

Your code is correct but you have to assign the result back to the variable get the desired output.

df_smer_orig = pd.read_csv('sample.csv', engine='python')
df_smer = df_smer_orig['Item'].tolist()
df_smer = [x.lower() for x in df_smer] 

for x in df_smer:
    print(x)
    regex = re.compile('[^a-zA-Z]')
    x = regex.sub('', x)
    print(x)

print(df_smer)

answered Mar 16, 2019 at 13:25

Ashu Grover

7671 gold badge11 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kien Pham · Accepted Answer · 2019-03-16 13:24:04Z

1

Is that right?

text = re.sub(r'[^a-zA-Z]', '', text)

demo: http://tpcg.io/ZADE7f

answered Mar 16, 2019 at 13:24

Kien Pham

1089 bronze badges

Collectives™ on Stack Overflow

Regular Expression to remove non alpha numeric characters is not working

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related