2

I have a python string that I need to remove parentheses. The standard way is to use text = re.sub(r'\([^)]*\)', '', text), so the content within the parentheses will be removed.

However, I just found a string that looks like (Data with in (Boo) And good luck). With the regex I use, it will still have And good luck) part left. I know I can scan through the entire string and try to keep a counter of number of ( and ) and when the numbers are balanced, index the location of ( and ) and remove the content within middle, but is there a better/cleaner way for doing that? It doesn't need to be regex, whatever it will work is great, thanks.

Someone asked for expected result so here's what I am expecting:

Hi this is a test ( a b ( c d) e) sentence

Post replace I want it to be Hi this is a test sentence, instead of Hi this is a test e) sentence

6
  • 1
    It isn't possible to do it with the re module, but you can do it with the regex module that allows recursion. pypi.python.org/pypi/regex Commented Aug 18, 2016 at 19:39
  • 1
    In the worst case you can do it with the re module if you build a pattern to match the innermost parenthesis \([^()]*\) and if you loop the replacement until there is nothing to replace. But it isn't a very elegant way since you need to parse the string several times. Commented Aug 18, 2016 at 19:46
  • Are you open to non-regex solutions? Commented Aug 18, 2016 at 19:49
  • 1
    Can you please share what you expect with the example you gave to make it more clear? Commented Aug 18, 2016 at 19:51
  • I only see one space in the result between "test" and "sentence". If that's the case, are you saying we need to remove a space before "("? Or remove a space after a ")"? Commented Aug 19, 2016 at 6:12

5 Answers 5

6

With the re module (replace the innermost parenthesis until there's no more replacement to do):

import re

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

nb_rep = 1

while (nb_rep):
    (s, nb_rep) = re.subn(r'\([^()]*\)', '', s)

print(s)

With the regex module that allows recursion:

import regex

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

print(regex.sub(r'\([^()]*+(?:(?R)[^()]*)*+\)', '', s))

Where (?R) refers to the whole pattern itself.

Sign up to request clarification or add additional context in comments.

1 Comment

The first answer is beautiful and awesome. Thank you.
2

First I split the line into tokens that do not contain the parenthesis, for later on joining them into a new line:

line = "(Data with in (Boo) And good luck)"
new_line = "".join(re.split(r'(?:[()])',line))
print ( new_line )
# 'Data with in Boo And good luck'

Comments

2

No regex...

>>> a = 'Hi this is a test ( a b ( c d) e) sentence'
>>> o = ['(' == t or t == ')' for t in a]
>>> o
[False, False, False, False, False, False, False, False, False, False,
 False, False, False, False, False, False, False, False, True, False, False, 
 False, False, False, True, False, False, False, False, True, False, False,
 True, False, False, False, False, False, False, False, False, False]
>>> start,end=0,0
>>> for n,i in enumerate(o):
...  if i and not start:
...   start = n
...  if i and start:
...   end = n
...
>>>
>>> start
18
>>> end
32
>>> a1 = ' '.join(''.join(i for n,i in enumerate(a) if (n<start or n>end)).split())
>>> a1
'Hi this is a test sentence'
>>>

1 Comment

No regex and no Python loops: stackoverflow.com/a/77462758/5231110
1

Assuming (1) there are always matching parentheses and (2) we only remove the parentheses and everything in between them (ie. surrounding spaces around the parentheses are untouched), the following should work.

It's basically a state machine that maintains the current depth of nested parentheses. We keep the character if it's (1) not a parenthesis and (2) the current depth is 0.

No regexes. No recursion. A single pass through the input string without any intermediate lists.

tests = [
    "Hi this is a test ( a b ( c d) e) sentence",
    "(Data with in (Boo) And good luck)",
]

delta = {
    '(': 1,
    ')': -1,
}

def remove_paren_groups(input):
    depth = 0

    for c in input:
        d = delta.get(c, 0)
        depth += d
        if d != 0 or depth > 0:
            continue
        yield c

for input in tests:
    print ' IN: %s' % repr(input)
    print 'OUT: %s' % repr(''.join(remove_paren_groups(input)))

Output:

 IN: 'Hi this is a test ( a b ( c d) e) sentence'
OUT: 'Hi this is a test  sentence'
 IN: '(Data with in (Boo) And good luck)'
OUT: ''

Comments

0

Referenced from here

import re
item = "example (.com) w3resource github (.com) stackoverflow (.com)"

### Add lines in case there are non-ascii problem:
# -*- coding: utf-8 -*-
item = item .decode('ascii', errors = 'ignore').encode()

print re.sub(r" ?\([^)]+\)", "", item)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.