0

I want to use a regular expression to detect and substitute some phrases. These phrases follow the same pattern but deviate at some points. All the phrases are in the same string.

For instance I have this string:

/this/is//an example of what I want /to///do

I want to catch all the words inside and including the // and substitute them with "".

To solve this, I used the following code:

import re
txt = "/this/is//an example of what i want /to///do"
re.search("/.*/",txt1, re.VERBOSE)
pattern1 = r"/.*?/\w+"
a = re.sub(pattern1,"",txt)

The result is:

' example of what i want '

which is what I want, that is, to substitute the phrases within // with "". But when I run the same pattern on the following sentence

"/this/is//an example of what i want to /do"

I get

' example of what i want to /do'

How can I use one regex and remove all the phrases and //, irrespective of the number of // in a phrase?

3
  • I see one, two, and three forward slashes. What do each of this mean here? Commented Oct 27, 2021 at 8:34
  • This is some text that I have as a result of webscraping. I want to clean the text from phrases and small sentences that do not add to the content. All the phrases that are within the slices and the slices themselves are just white noise for the algorithm I want to run. Commented Oct 27, 2021 at 8:38
  • What is the exact output that you want? Commented Oct 27, 2021 at 8:53

2 Answers 2

1

In your example code, you can omit this part re.search("/.*/",txt1, re.VERBOSE) as is executes the command, but you are not doing anything with the result.

You can match 1 or more / followed by word chars:

/+\w+

Or a bit broader match, matching one or more / followed by all chars other than / or a whitspace chars:

/+[^\s/]+
  • /+ Match 1+ occurrences of /
  • [^\s/]+ Match 1+ occurrences of any char except a whitespace char or /

Regex demo

import re

strings = [
    "/this/is//an example of what I want /to///do",
    "/this/is//an example of what i want to /do"
]

for txt in strings:    
    pattern1 = r"/+[^\s/]+"
    a = re.sub(pattern1, "", txt)
    print(a)

Output

 example of what I want 
 example of what i want to 
Sign up to request clarification or add additional context in comments.

Comments

0

You can use

/(?:[^/\s]*/)*\w+

See the regex demo. Details:

  • / - a slash
  • (?:[^/\s]*/)* - zero or more repetitions of any char other than a slash and whitespace
  • \w+ - one or more word chars.

See the Python demo:

import re
rx = re.compile(r"/(?:[^/\s]*/)*\w+")
texts = ["/this/is//an example of what I want /to///do", "/this/is//an example of what i want to /do"]
for text in texts:
    print( rx.sub('', text).strip() ) 
# => example of what I want
#    example of what i want to

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.