1

Replacing numbers with a placeholder in a string inclding decimals and percentages using re in Python

def remove_numbers(text):
    remove = re.sub(r"\W\d\S*", " [DD]", text,)
    return remove

The function works fine on this sample string. sample = "I can give you 10% of 100,000 to you. The thing went up by 10% so it costs 12.25 euros now. But if a string starts with a number, the first numer does not get replaced by the placeholder.

6
  • 1
    where did i work perfectly, can you add that, also add more example of input and output Commented Jun 17, 2019 at 18:47
  • sample = I can give 50% of 100,000 to you in cash. it went up by 2.3% and its costly. Commented Jun 17, 2019 at 18:50
  • Add this to the question instead, and the expected output Commented Jun 17, 2019 at 18:50
  • it worked on that string perfectly, but if the number is at the start of the string it dosent seem to work Commented Jun 17, 2019 at 18:50
  • What the expected output for I can give 50% of 100,000 to you in cash. it went up by 2.3% and its costly ? Commented Jun 17, 2019 at 18:53

5 Answers 5

1

So looping through the replace method seems to be the easiest way to do this.

def remove_numbers(text):
    nums = '123456787980'
    for i in nums:
        text = text.replace(i, '[DD]')

    return text
Sign up to request clarification or add additional context in comments.

Comments

1

\W will not match at the start of string. It appears you are using \W to make sure that the number you are replacing is not a part of a word. This makes sense. But, \W doesn't match at start-of-string. You can use \A for that. But, you probably don't want to add a space when you are replacing at start-of-string. This can be done in a single regex, but I think it results in easier-to-read code if you do it in two steps.

import re

def remove_numbers(text):
    # replace internal numbers that are not a part of a word (adds a space)
    remove = re.sub(r"\W\d\S*", " [DD]", text,)
    # replace number at start of string (if any) (does not add a space)
    remove = re.sub(r"\A\d\S*", "[DD]", remove,)
    return remove

a = "3 foxes jumped over 3 fences"
b = remove_numbers(a)

print("before <{}>".format(a))
print("after <{}>".format(b))

Comments

0

\W requires a character to be there, so when you try it with a number at the beginning it'll look like just \d\S*.

Use '\b' instead of '\w' to match word boundaries:

def remove_numbers(text):
    remove = re.sub(r"\b\d\S*", "[DD]", text,)
    return remove

Or, keeping more in the spirit of your original code:

def remove_numbers(text):
    remove = re.sub(r"(\s|^)\d\S*", r"\1[DD]", text,)
    return remove

And use \d+ instead of \d if you want to also match multiple digits in a row.

Comments

0

Do this:

import re
def remove_numbers(text):
    remove = re.sub(r"\W?\d\S*", " [DD]", text,)
    return remove.strip()

print(remove_numbers())

The ? means 0 or more of the previous pattern

Comments

0

Change your regex to :

    remove = re.sub("^\d+\s|\s\d+\s|\s\d+$", " [DD] ", text)

All code :

import re
def remove_numbers(text):
    s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " [DD] ", text)

    return s

t1 = "3 foxes jumped over 3 fences"
print (remove_numbers(t1))

Output :

[DD] foxes jumped over [DD] fences

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.