Using regEx to remove digits from string

Question

I am trying to remove all digits from a string that are not attached to a word. Examples:

 "python 3" => "python"
 "python3" => "python3"
 "1something" => "1something"
 "2" => ""
 "434" => ""
 "python 35" => "python"
 "1 " => ""
 " 232" => ""

Till now I am using the following regular expression:

((?<=[ ])[0-9]+(?=[ ])|(?<=[ ])[0-9]+|^[0-9]$)

which can correctly do some of the examples above, but not all. Any help and some explanation?

I managed a solution without using reg exp. But I would like to see the reg exp solution. — Mpizos Dimitris
– Mpizos Dimitris, Commented Oct 21, 2016 at 13:55

brianpck · Accepted Answer · 2016-10-21 14:08:42Z

5

Why not just use word boundaries?

\b\d+\b

Here is an example:

>>> import re
>>> words = ['python 3', 'python3', '1something', '2', '434', 'python 35', '1 ', ' 232']
>>> for word in words:
...     print("'{}' => '{}'".format(word, re.sub(r'\b\d+\b', '', word)))
...
'python 3' => 'python '
'python3' => 'python3'
'1something' => '1something'
'2' => ''
'434' => ''
'python 35' => 'python '
'1 ' => ' '
' 232' => ' '

Note that this will not remove spaces before and after. I would advise using strip(), but if not you can probably do \b\d+\b\s* (for space after) or something similar.

edited Oct 21, 2016 at 14:08

answered Oct 21, 2016 at 13:54

brianpck

8,3241 gold badge25 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

milo.farrell Over a year ago

Just a word of warning I think this, \b\d+\b, will match if you have something like this "python-3" or "python_3", this may be what you want but its worth noting.

brianpck Over a year ago

@milo.farrell Yes for - but not for an underscore.

Padraic Cunningham · Accepted Answer · 2016-10-21 14:38:35Z

3

You could just split the words and remove any words that are digits which is a lot easier to read:

new = " ".join([w for w in s.split() if not w.isdigit()])

And also seems faster:

In [27]: p = re.compile(r'\b\d+\b')

In [28]: s =  " ".join(['python 3', 'python3', '1something', '2', '434', 'python
    ...:  35', '1 ', ' 232'])

In [29]: timeit " ".join([w for w in s.split() if not w.isdigit()])

100000 loops, best of 3: 1.54 µs per loop

In [30]: timeit p.sub('', s)

100000 loops, best of 3: 3.34 µs per loop

It also removes the space like your expected output:

In [39]:  re.sub(r'\b\d+\b', '', " 2")
Out[39]: ' '

In [40]:  " ".join([w for w in " 2".split() if not w.isdigit()])
Out[40]: ''

In [41]:  re.sub(r'\b\d+\b', '', s)
Out[41]: 'python  python3 1something   python     '

In [42]:  " ".join([w for w in s.split() if not w.isdigit()])
Out[42]: 'python python3 1something python'

So both approaches are significantly different.

edited Oct 21, 2016 at 14:38

answered Oct 21, 2016 at 14:15

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

5 Comments

brianpck Over a year ago

The OP mentioned in a comment that he has a solution without regexp already. (But yes, this is definitely the best way)

Padraic Cunningham Over a year ago

@brianpck, the OP also wants "1 " to become "" which this does and also more efficiently so I will leave the answer as it is a better overall approach for future readers and the fact it does the correct thing that the OP seems to want.

Bahrom Over a year ago

actually, you could just pass the generator into join instead of creating a list: " ".join(w for w in s.split() if not w.isdigit())

Padraic Cunningham Over a year ago

@Bahrom, that is slower as python will build a list internally if you pass a generator,

Padraic Cunningham Over a year ago

@Bahrom, some comparisons here stackoverflow.com/a/9061024/2141635, join has to do two passes over the data, one to check the size needed and one to create the new string. Because you can only iterate a generator once, it first has to be converted a list.

milo.farrell · Accepted Answer · 2016-10-21 14:03:25Z

0

This regex, (\s|^)\d+(\s|$), could work as shown below in javascript

var value = "1 3@bar @foo2 * 112";
var matches = value.replace(/(\s|^)\d+(\s|$)/g,"");
console.log(matches)

It works in 3 parts:

It first matches a space or begging of string using (\s|^) with \s matching a white-space | meaning or and ^ meaning beginning of string.
next matching digits from 1 to times using \d for a digit and + to match 1 to N times but as many as possible.
Finally (\s|$) matches a space or end of sting with \s matching space, | meaning or, and $ matching end of string.

You can replace $ with end of line or \n if you have several lines or just add it in next to it like this (\s|$|\n). Hope this is what your're looking for.

edited Oct 21, 2016 at 14:03

answered Oct 21, 2016 at 13:57

milo.farrell

6827 silver badges19 bronze badges

1 Comment

brianpck Over a year ago

This will match double spaces

Collectives™ on Stack Overflow

Using regEx to remove digits from string

3 Answers 3

2 Comments

5 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related