Regular expression for substitution of similar pattern in a string in Python

Question

I want to use a regular expression to detect and substitute some phrases. These phrases follow the same pattern but deviate at some points. All the phrases are in the same string.

For instance I have this string:

/this/is//an example of what I want /to///do

I want to catch all the words inside and including the // and substitute them with "".

To solve this, I used the following code:

import re
txt = "/this/is//an example of what i want /to///do"
re.search("/.*/",txt1, re.VERBOSE)
pattern1 = r"/.*?/\w+"
a = re.sub(pattern1,"",txt)

The result is:

' example of what i want '

which is what I want, that is, to substitute the phrases within // with "". But when I run the same pattern on the following sentence

"/this/is//an example of what i want to /do"

I get

' example of what i want to /do'

How can I use one regex and remove all the phrases and //, irrespective of the number of // in a phrase?

I see one, two, and three forward slashes. What do each of this mean here? — Tim Biegeleisen
– Tim Biegeleisen, Commented Oct 27, 2021 at 8:34
This is some text that I have as a result of webscraping. I want to clean the text from phrases and small sentences that do not add to the content. All the phrases that are within the slices and the slices themselves are just white noise for the algorithm I want to run. — Almosino
– Almosino, Commented Oct 27, 2021 at 8:38

The fourth bird · Accepted Answer · 2021-10-27 08:54:55Z

1

In your example code, you can omit this part re.search("/.*/",txt1, re.VERBOSE) as is executes the command, but you are not doing anything with the result.

You can match 1 or more / followed by word chars:

/+\w+

Or a bit broader match, matching one or more / followed by all chars other than / or a whitspace chars:

/+[^\s/]+

/+ Match 1+ occurrences of /
[^\s/]+ Match 1+ occurrences of any char except a whitespace char or /

Regex demo

import re

strings = [
    "/this/is//an example of what I want /to///do",
    "/this/is//an example of what i want to /do"
]

for txt in strings:    
    pattern1 = r"/+[^\s/]+"
    a = re.sub(pattern1, "", txt)
    print(a)

Output

 example of what I want 
 example of what i want to

edited Oct 27, 2021 at 8:54

answered Oct 27, 2021 at 8:48

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wiktor Stribiżew · Accepted Answer · 2021-10-27 08:47:41Z

0

You can use

/(?:[^/\s]*/)*\w+

See the regex demo. Details:

/ - a slash
(?:[^/\s]*/)* - zero or more repetitions of any char other than a slash and whitespace
\w+ - one or more word chars.

See the Python demo:

import re
rx = re.compile(r"/(?:[^/\s]*/)*\w+")
texts = ["/this/is//an example of what I want /to///do", "/this/is//an example of what i want to /do"]
for text in texts:
    print( rx.sub('', text).strip() ) 
# => example of what I want
#    example of what i want to

answered Oct 27, 2021 at 8:47

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Collectives™ on Stack Overflow

Regular expression for substitution of similar pattern in a string in Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related