Delete digits in Python (Regex)

Question

I'm trying to delete all digits from a string. However the next code deletes as well digits contained in any word, and obviously I don't want that. I've been trying many regular expressions with no success.

Thanks!

s = "This must not b3 delet3d, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s

Result:

This must not b deletd, but the number at the end yes

oneporter · Accepted Answer · 2009-05-03 14:41:34Z

56

Add a space before the \d+.

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

Edit: After looking at the comments, I decided to form a more complete answer. I think this accounts for all the cases.

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)

edited May 3, 2009 at 14:41

answered May 3, 2009 at 14:04

oneporter

3,1143 gold badges25 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

moinudin Over a year ago

What about strings such as " 3at"?

user97370 Over a year ago

Here's another 2 cases for your unit tests: '123 should be deleted.' and 'You have been 0wn3d'

Marigold Over a year ago

Another one re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", "1 2 3 fails for me")

jrcalzada · Accepted Answer · 2009-05-03 14:12:44Z

21

Try this:

"\b\d+\b"

That'll match only those digits that are not part of another word.

answered May 3, 2009 at 14:12

jrcalzada

3621 silver badge4 bronze badges

4 Comments

oneporter Over a year ago

This doesn't delete the first or last numbers for, s = s = "1234 This must not b3 delet3d, 123 but the number at the end yes 134411"

jrcalzada Over a year ago

I just tested it with your string and I got the expected result. \b matches either the beginning of the string, the end, or anything that isn't a word character ([A-Za-z0-9_]). I tested it in IronPython though, don't know if there's something wrong with Python's handling of word boundaries

Bill Lynch Over a year ago

I haven't tried this, but could you do something like: [^\b]\d+[$\b]

dwc Over a year ago

sharth: that's essentially the same. \b will match at the beginning or end of the string already. It's a "null pattern" that matches "between" a word and a non-word. So re.sub(r'\b', '!', 'one two') gives "!one! !two!"

dwc · Accepted Answer · 2009-05-03 15:05:28Z

7

Using \s isn't very good, since it doesn't handle tabs, et al. A first cut at a better solution is:

re.sub(r"\b\d+\b", "", s)

Note that the pattern is a raw string because \b is normally the backspace escape for strings, and we want the special word boundary regex escape instead. A slightly fancier version is:

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

That tries to remove leading/trailing whitespace when there are digits at the beginning/end of the string. I say "tries" because if there are multiple numbers at the end then you still have some spaces.

answered May 3, 2009 at 15:05

dwc

25k7 gold badges47 silver badges55 bronze badges

Comments

Lance Richardson · Accepted Answer · 2009-05-03 14:23:58Z

6

To handle digit strings at the beginning of a line as well:

s = re.sub(r"(^|\W)\d+", "", s)

answered May 3, 2009 at 14:23

Lance Richardson

4,63026 silver badges30 bronze badges

Comments

Avishay Cohen · Accepted Answer · 2019-03-06 13:33:45Z

4

You could try this

s = "This must not b3 delet3d, but the number at the end yes 134411"
re.sub("(\s\d+)","",s)

result:

'This must not b3 delet3d, but the number at the end yes'

the same rule also applies to

s = "This must not b3 delet3d, 4566 but the number at the end yes 134411" 
re.sub("(\s\d+)","",s)

result:

'This must not b3 delet3d, but the number at the end yes'

edited Mar 6, 2019 at 13:33

Avishay Cohen

2,2882 gold badges24 silver badges35 bronze badges

answered Dec 15, 2018 at 7:45

adesst

3071 gold badge3 silver badges7 bronze badges

Comments

gseattle · Accepted Answer · 2021-03-15 03:21:49Z

4

To match only pure integers in a string:

\b(?<![0-9-])(\d+)(?![0-9-])\b

It does the right thing with this, matching only everything after million:

max-3 cvd-19 agent-007 8-zoo 2ab c3d ef4 55g h66i jk77 
8m9n o0p2     million     0 22 333  4444

All of the other 8 regex answers on this page fail in various ways with that input.

The dash at the end by that first 0-9 ... [0-9-] ... preserves -007 and the dash in the second set preserves 8-.

Or \d in place of 0-9 if you prefer

at regex101

Can it be simplified?

edited Mar 15, 2021 at 3:21

answered Mar 15, 2021 at 2:38

gseattle

1,0221 gold badge15 silver badges23 bronze badges

1 Comment

gseattle Over a year ago

The parens around \d+ can be dropped but could be used to capture just the pure digits

si28719e · Accepted Answer · 2009-05-04 02:05:01Z

2

I don't know what your real situation looks like, but most of the answers look like they won't handle negative numbers or decimals,

re.sub(r"(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b","")

The above should also handle things like,

"This must not b3 delet3d, but the number at the end yes -134.411"

But this is still incomplete - you probably need a more complete definition of what you can expect to find in the files you need to parse.

Edit: it's also worth noting that '\b' changes depending on the locale/character set you are using so you need to be a little careful with that.

edited May 4, 2009 at 2:05

answered May 3, 2009 at 15:37

si28719e

2,1655 gold badges20 silver badges22 bronze badges

Comments

Areza · Accepted Answer · 2022-06-23 10:04:20Z

2

If your number is allways at the end of your strings try :

re.sub("\d+$", "", s)

otherwise, you may try

re.sub("(\s)\d+(\s)", "\1\2", s)

You can adjust the back-references to keep only one or two of the spaces (\s match any white separator)

edited Jun 23, 2022 at 10:04

Areza

6,1729 gold badges57 silver badges92 bronze badges

answered May 3, 2009 at 14:06

Raoul Supercopter

5,1241 gold badge36 silver badges37 bronze badges

1 Comment

dwc Over a year ago

\W is probably better than \s for this. Also, a better variation would be "\b\d+\b" except that it fails to work for me!

dbr · Accepted Answer · 2009-05-03 15:21:27Z

1

Non-regex solution:

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

Splits by " ", and checks if the chunk is a number by doing str().isdigit(), then joins them back together. More verbosely (not using a list comprehension):

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():
        non_digits.append(word)

" ".join(non_digits)

answered May 3, 2009 at 15:21

dbr

171k69 gold badges284 silver badges348 bronze badges

Comments

ryhn · Accepted Answer · 2021-11-28 16:49:57Z

1

I had a light-bulb moment, I tried and it works:

sol = re.sub(r'[~^0-9]', '', 'aas30dsa20')

output:

aasdsa

answered Nov 28, 2021 at 16:49

ryhn

948 bronze badges

Comments

Prabakar Subramaniam · Accepted Answer · 2017-11-20 12:54:20Z

-1

>>>s = "This must not b3 delet3d, but the number at the end yes 134411"
>>>s = re.sub(r"\d*$", "", s)
>>>s

"This must not b3 delet3d, but the number at the end yes "

This will remove the numericals at the end of the string.

answered Nov 20, 2017 at 12:54

Prabakar Subramaniam

11 bronze badge

Collectives™ on Stack Overflow

Delete digits in Python (Regex)

11 Answers 11

3 Comments

4 Comments

Comments

Comments

Comments

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

3 Comments

4 Comments

Comments

Comments

Comments

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related