5

I have some text in Python which is composed of numbers and alphabets. Something like this:

s = "12 word word2"

From the string s, I want to remove all the words containing only numbers

So I want the result to be

s = "word word2"

This is a regex I have but it works on alphabets i.e. it replaces each alphabet by a space.

re.sub('[\ 0-9\ ]+', ' ', line)

Can someone help in telling me what is wrong? Also, is there a more time-efficient way to do this than regex?

Thanks!

3 Answers 3

10

You can use this regex:

>>> s = "12 word word2"
>>> print re.sub(r'\b[0-9]+\b\s*', '', s)
word word2

\b is used for word boundary and \s* will remove 0 or more spaces after your number word.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer. Can the regex be modified to remove all punctuation/special symbols as follows ? re.sub(r'\b[!~`.,/<>]+\b\s*', '', s)
Sure you can use: re.sub(r'\b[0-9]+\b\W*', '', s) as \W matches space or any other non-word character.
7

Using a regex is probably a bit overkill here depending whether you need to preserve whitespace:

s = "12 word word2"
s2 = ' '.join(word for word in s.split() if not word.isdigit())
# 'word word2'

Comments

1

Without using any external library you could do:

stringToFormat = "12 word word2"
words = ""
for word in stringToFormat.split(" "):
    try:
        int(word)
    except ValueError:
        words += "{} ".format(word)
print(words)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.