0

I have list of strings and I have to remove all special characters (, - ' " .).

My code is

import glob
import re

files = []
for text in glob.glob("*.txt.txt"):
 with open(text) as f:
    fileRead = [ line.lower() for line in f]
 files.append(fileRead)

files1 = []

for item in files :
 files1.append(''.join(item))

I have used lot of options including "replace", "strip" and "re".

when I use strip (shown below), the code runs but no changes are seen in output.

files1 = [line.strip("'") for line in files1]

When I use re, I get TypeError: expected string or bytes-like object. I changed to list of strings from list of lists so that I can use re. This method is stated many times but did not solve the problem for me.

files1 = re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", files1)

I am not able to use replace as it throws an attribute error that replace cannot be used on lists.

Please suggest me how can I get rid of all special characters.

7
  • 1
    are you using python3 or python2.7? Commented Sep 11, 2018 at 16:06
  • files1 is a list, not string. You need to pass a string to re.sub. So try element-wise. Commented Sep 11, 2018 at 16:16
  • @machetazo I am using python 3. Commented Sep 11, 2018 at 16:17
  • @KotaMori I have tried that too - is there anything in this? files1 = [re.sub('[-()\"#/@;:<>{}`+=~|.!?,]', '', files1) for y in files1] Commented Sep 11, 2018 at 16:20
  • 1
    Pass y not files? If you still get error, provide the result of type(files1[0]) Commented Sep 11, 2018 at 16:22

3 Answers 3

4

You should apply the re.sub function on single objects, not on lists.

files_cleaned = [re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,]", "", file) for file in files]

If you only want to accept alphanumerical characters you can do this instead:

files_cleaned = [re.sub(r"[^a-zA-Z0-9]", "", file) for file in files]
Sign up to request clarification or add additional context in comments.

Comments

0

You can use str.isalnum

will return True if all the character in the str are Alpha numeric.

Comments

0

try below example:

files = ["Hello%","&*hhf","ddh","GTD@JJ"]    #input data in list

# going through each element of list
# apllying a filter on each character of string for alphabet or numeric other then special symbol
# joining the charactors back again and putting them in list
result = ["".join(list(filter(str.isalnum, line))) for line in files]

print(result)    #print the result

Output:

['Hello', 'hhf', 'ddh', 'GTDJJ']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.