2

I am trying to remove all digits from a string that are not attached to a word. Examples:

 "python 3" => "python"
 "python3" => "python3"
 "1something" => "1something"
 "2" => ""
 "434" => ""
 "python 35" => "python"
 "1 " => ""
 " 232" => ""

Till now I am using the following regular expression:

((?<=[ ])[0-9]+(?=[ ])|(?<=[ ])[0-9]+|^[0-9]$)

which can correctly do some of the examples above, but not all. Any help and some explanation?

9
  • 4
    Wait, why "1something" => "something"? Commented Oct 21, 2016 at 13:50
  • Thanks, u are right! corrected it. Commented Oct 21, 2016 at 13:52
  • Why not just search for ( \d+ ) and remove it? Commented Oct 21, 2016 at 13:52
  • cause it will remove the attached digits. Commented Oct 21, 2016 at 13:53
  • I managed a solution without using reg exp. But I would like to see the reg exp solution. Commented Oct 21, 2016 at 13:55

3 Answers 3

5

Why not just use word boundaries?

\b\d+\b

Here is an example:

>>> import re
>>> words = ['python 3', 'python3', '1something', '2', '434', 'python 35', '1 ', ' 232']
>>> for word in words:
...     print("'{}' => '{}'".format(word, re.sub(r'\b\d+\b', '', word)))
...
'python 3' => 'python '
'python3' => 'python3'
'1something' => '1something'
'2' => ''
'434' => ''
'python 35' => 'python '
'1 ' => ' '
' 232' => ' '

Note that this will not remove spaces before and after. I would advise using strip(), but if not you can probably do \b\d+\b\s* (for space after) or something similar.

Sign up to request clarification or add additional context in comments.

2 Comments

Just a word of warning I think this, \b\d+\b, will match if you have something like this "python-3" or "python_3", this may be what you want but its worth noting.
@milo.farrell Yes for - but not for an underscore.
3

You could just split the words and remove any words that are digits which is a lot easier to read:

new = " ".join([w for w in s.split() if not w.isdigit()])

And also seems faster:

In [27]: p = re.compile(r'\b\d+\b')

In [28]: s =  " ".join(['python 3', 'python3', '1something', '2', '434', 'python
    ...:  35', '1 ', ' 232'])

In [29]: timeit " ".join([w for w in s.split() if not w.isdigit()])

100000 loops, best of 3: 1.54 µs per loop

In [30]: timeit p.sub('', s)

100000 loops, best of 3: 3.34 µs per loop

It also removes the space like your expected output:

In [39]:  re.sub(r'\b\d+\b', '', " 2")
Out[39]: ' '

In [40]:  " ".join([w for w in " 2".split() if not w.isdigit()])
Out[40]: ''

In [41]:  re.sub(r'\b\d+\b', '', s)
Out[41]: 'python  python3 1something   python     '

In [42]:  " ".join([w for w in s.split() if not w.isdigit()])
Out[42]: 'python python3 1something python'

So both approaches are significantly different.

5 Comments

The OP mentioned in a comment that he has a solution without regexp already. (But yes, this is definitely the best way)
@brianpck, the OP also wants "1 " to become "" which this does and also more efficiently so I will leave the answer as it is a better overall approach for future readers and the fact it does the correct thing that the OP seems to want.
actually, you could just pass the generator into join instead of creating a list: " ".join(w for w in s.split() if not w.isdigit())
@Bahrom, that is slower as python will build a list internally if you pass a generator,
@Bahrom, some comparisons here stackoverflow.com/a/9061024/2141635, join has to do two passes over the data, one to check the size needed and one to create the new string. Because you can only iterate a generator once, it first has to be converted a list.
0

This regex, (\s|^)\d+(\s|$), could work as shown below in javascript

var value = "1 3@bar @foo2 * 112";
var matches = value.replace(/(\s|^)\d+(\s|$)/g,"");
console.log(matches)

It works in 3 parts:

  1. It first matches a space or begging of string using (\s|^) with \s matching a white-space | meaning or and ^ meaning beginning of string.
  2. next matching digits from 1 to times using \d for a digit and + to match 1 to N times but as many as possible.
  3. Finally (\s|$) matches a space or end of sting with \s matching space, | meaning or, and $ matching end of string.

You can replace $ with end of line or \n if you have several lines or just add it in next to it like this (\s|$|\n). Hope this is what your're looking for.

1 Comment

This will match double spaces

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.