40

I'm trying to delete all digits from a string. However the next code deletes as well digits contained in any word, and obviously I don't want that. I've been trying many regular expressions with no success.

Thanks!


s = "This must not b3 delet3d, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s

Result:

This must not b deletd, but the number at the end yes

0

11 Answers 11

56

Add a space before the \d+.

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

Edit: After looking at the comments, I decided to form a more complete answer. I think this accounts for all the cases.

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)
Sign up to request clarification or add additional context in comments.

3 Comments

What about strings such as " 3at"?
Here's another 2 cases for your unit tests: '123 should be deleted.' and 'You have been 0wn3d'
Another one re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", "1 2 3 fails for me")
21

Try this:

"\b\d+\b"

That'll match only those digits that are not part of another word.

4 Comments

This doesn't delete the first or last numbers for, s = s = "1234 This must not b3 delet3d, 123 but the number at the end yes 134411"
I just tested it with your string and I got the expected result. \b matches either the beginning of the string, the end, or anything that isn't a word character ([A-Za-z0-9_]). I tested it in IronPython though, don't know if there's something wrong with Python's handling of word boundaries
I haven't tried this, but could you do something like: [^\b]\d+[$\b]
sharth: that's essentially the same. \b will match at the beginning or end of the string already. It's a "null pattern" that matches "between" a word and a non-word. So re.sub(r'\b', '!', 'one two') gives "!one! !two!"
7

Using \s isn't very good, since it doesn't handle tabs, et al. A first cut at a better solution is:

re.sub(r"\b\d+\b", "", s)

Note that the pattern is a raw string because \b is normally the backspace escape for strings, and we want the special word boundary regex escape instead. A slightly fancier version is:

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

That tries to remove leading/trailing whitespace when there are digits at the beginning/end of the string. I say "tries" because if there are multiple numbers at the end then you still have some spaces.

Comments

6

To handle digit strings at the beginning of a line as well:

s = re.sub(r"(^|\W)\d+", "", s)

Comments

4

You could try this

s = "This must not b3 delet3d, but the number at the end yes 134411"
re.sub("(\s\d+)","",s) 

result:

'This must not b3 delet3d, but the number at the end yes'

the same rule also applies to

s = "This must not b3 delet3d, 4566 but the number at the end yes 134411" 
re.sub("(\s\d+)","",s) 

result:

'This must not b3 delet3d, but the number at the end yes'

Comments

4

To match only pure integers in a string:

\b(?<![0-9-])(\d+)(?![0-9-])\b

It does the right thing with this, matching only everything after million:

max-3 cvd-19 agent-007 8-zoo 2ab c3d ef4 55g h66i jk77 
8m9n o0p2     million     0 22 333  4444

All of the other 8 regex answers on this page fail in various ways with that input.

The dash at the end by that first 0-9 ... [0-9-] ... preserves -007 and the dash in the second set preserves 8-.

Or \d in place of 0-9 if you prefer

at regex101 enter image description here

Can it be simplified?

1 Comment

The parens around \d+ can be dropped but could be used to capture just the pure digits
2

I don't know what your real situation looks like, but most of the answers look like they won't handle negative numbers or decimals,

re.sub(r"(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b","")

The above should also handle things like,

"This must not b3 delet3d, but the number at the end yes -134.411"

But this is still incomplete - you probably need a more complete definition of what you can expect to find in the files you need to parse.

Edit: it's also worth noting that '\b' changes depending on the locale/character set you are using so you need to be a little careful with that.

Comments

2

If your number is allways at the end of your strings try :

re.sub("\d+$", "", s)

otherwise, you may try

re.sub("(\s)\d+(\s)", "\1\2", s)

You can adjust the back-references to keep only one or two of the spaces (\s match any white separator)

1 Comment

\W is probably better than \s for this. Also, a better variation would be "\b\d+\b" except that it fails to work for me!
1

Non-regex solution:

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

Splits by " ", and checks if the chunk is a number by doing str().isdigit(), then joins them back together. More verbosely (not using a list comprehension):

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():
        non_digits.append(word)

" ".join(non_digits)

Comments

1

I had a light-bulb moment, I tried and it works:

sol = re.sub(r'[~^0-9]', '', 'aas30dsa20')

output:

aasdsa

Comments

-1
>>>s = "This must not b3 delet3d, but the number at the end yes 134411"
>>>s = re.sub(r"\d*$", "", s)
>>>s

"This must not b3 delet3d, but the number at the end yes "

This will remove the numericals at the end of the string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.