2

I am writing some code in Python, trying to clean a string all to lower case without special characters.

string_salada_russa = '    !!   LeTRas PeqUEnAS &    GraNdeS'

clean_string = string_salada_russa.lower().strip()


print(clean_string)

i = 0

for c in clean_string:
  if(c.isalpha() == False and c != " "):
    clean_string = clean_string.replace(c, "").strip()


print(clean_string)

for c in clean_string:
  if(i >= 1 and i <= len(clean_string)-1):
    if(clean_string[i] == " " and clean_string[i-1] == " " and clean_string[i+1] == " "):
      clean_string = clean_string.replace(clean_string[i], "")
  i += 1


  
print(clean_string)

Expected outcome would be:

#original string
'    !!   LeTRas PeqUEnAS &    GraNdeS'

#expected
'letras pequenas grandes'

#actual outcome
'letraspequenasgrandes'

I am trying to remove the extra spaces, however unsucessfully. I end up removing ALL spaces.

Could anyone help me figure it out? What is wrong in my code?

1
  • 1
    This is probably because you shorten clean_string while iterating over it. A better strategy might be to iterate over clean_string and copy the letters you want to keep to another string (or even better, to a list, which you then join together when you're done). Commented Nov 1, 2021 at 0:45

3 Answers 3

2

How about using re?

import re

s = '    !!   LeTRas PeqUEnAS &    GraNdeS'
s = re.sub(r"[^a-zA-Z]+", " ", s.lower()).strip()
print(s) # letras pequenas grandes

This first translates the letters into lower case (lower), replace each run of non-alphabetical characters into a single blank (re.sub), and then remove blanks around the string (strip).

Btw, your code does not output 'letraspequenasgrandes'. Instead, it outputs 'letrasZpequenasZZZZZgrandes'.

Sign up to request clarification or add additional context in comments.

Comments

2

You could get away with a combination of str.lower(), str.split(), str.join() and str.isalpha():

def clean(s):
    return ' '.join(x for x in s.lower().split(' ') if x.isalpha())


s = '    !!   LeTRas PeqUEnAS &    GraNdeS'
print(clean(s))
# letras pequenas grandes

Basically, you first convert to lower and the split by ' '. After that you filter out non-alpha tokens and join them back.

Comments

1

There's no need to strip your string at each iteration of the first for loop; but, other than that, you could keep the first piece of your code:

for c in clean_string:
    if (c.isalpha() == False and c != " "):
        clean_string = clean_string.replace(c, "")

Then split your string, effectively removing all the spaces, and re-join the word back into a single string, with a single space between each word:

clean_string = " ".join(clean_string.split())

1 Comment

Do note that this will be somewhat inefficient, since the whole string will be copied at each iteration when there is an invalid char.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.