How can I remove text within multi layer of parentheses python [duplicate]

Question

I have a python string that I need to remove parentheses. The standard way is to use text = re.sub(r'\([^)]*\)', '', text), so the content within the parentheses will be removed.

However, I just found a string that looks like (Data with in (Boo) And good luck). With the regex I use, it will still have And good luck) part left. I know I can scan through the entire string and try to keep a counter of number of ( and ) and when the numbers are balanced, index the location of ( and ) and remove the content within middle, but is there a better/cleaner way for doing that? It doesn't need to be regex, whatever it will work is great, thanks.

Someone asked for expected result so here's what I am expecting:

Hi this is a test ( a b ( c d) e) sentence

Post replace I want it to be Hi this is a test sentence, instead of Hi this is a test e) sentence

It isn't possible to do it with the re module, but you can do it with the regex module that allows recursion. pypi.python.org/pypi/regex — Casimir et Hippolyte
– Casimir et Hippolyte, Commented Aug 18, 2016 at 19:39
In the worst case you can do it with the re module if you build a pattern to match the innermost parenthesis \([^()]*\) and if you loop the replacement until there is nothing to replace. But it isn't a very elegant way since you need to parse the string several times. — Casimir et Hippolyte
– Casimir et Hippolyte, Commented Aug 18, 2016 at 19:46
Can you please share what you expect with the example you gave to make it more clear? — Heval
– Heval, Commented Aug 18, 2016 at 19:51
I only see one space in the result between "test" and "sentence". If that's the case, are you saying we need to remove a space before "("? Or remove a space after a ")"? — beetea
– beetea, Commented Aug 19, 2016 at 6:12

Casimir et Hippolyte · Accepted Answer · 2016-08-19 16:52:40Z

6

With the re module (replace the innermost parenthesis until there's no more replacement to do):

import re

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

nb_rep = 1

while (nb_rep):
    (s, nb_rep) = re.subn(r'\([^()]*\)', '', s)

print(s)

With the regex module that allows recursion:

import regex

s = r'Sainte Anne -(Data with in (Boo) And good luck) Charenton'

print(regex.sub(r'\([^()]*+(?:(?R)[^()]*)*+\)', '', s))

Where (?R) refers to the whole pattern itself.

edited Aug 19, 2016 at 16:52

answered Aug 18, 2016 at 20:06

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JLTChiu Over a year ago

The first answer is beautiful and awesome. Thank you.

Arturo Ruiz Mañas · Accepted Answer · 2016-08-18 19:55:41Z

2

First I split the line into tokens that do not contain the parenthesis, for later on joining them into a new line:

line = "(Data with in (Boo) And good luck)"
new_line = "".join(re.split(r'(?:[()])',line))
print ( new_line )
# 'Data with in Boo And good luck'

answered Aug 18, 2016 at 19:55

Arturo Ruiz Mañas

571 silver badge3 bronze badges

Comments

yourstruly · Accepted Answer · 2016-08-18 20:51:51Z

2

No regex...

>>> a = 'Hi this is a test ( a b ( c d) e) sentence'
>>> o = ['(' == t or t == ')' for t in a]
>>> o
[False, False, False, False, False, False, False, False, False, False,
 False, False, False, False, False, False, False, False, True, False, False, 
 False, False, False, True, False, False, False, False, True, False, False,
 True, False, False, False, False, False, False, False, False, False]
>>> start,end=0,0
>>> for n,i in enumerate(o):
...  if i and not start:
...   start = n
...  if i and start:
...   end = n
...
>>>
>>> start
18
>>> end
32
>>> a1 = ' '.join(''.join(i for n,i in enumerate(a) if (n<start or n>end)).split())
>>> a1
'Hi this is a test sentence'
>>>

answered Aug 18, 2016 at 20:51

yourstruly

1,0021 gold badge10 silver badges17 bronze badges

1 Comment

root Over a year ago

No regex and no Python loops: stackoverflow.com/a/77462758/5231110

beetea · Accepted Answer · 2016-08-19 07:05:20Z

Assuming (1) there are always matching parentheses and (2) we only remove the parentheses and everything in between them (ie. surrounding spaces around the parentheses are untouched), the following should work.

It's basically a state machine that maintains the current depth of nested parentheses. We keep the character if it's (1) not a parenthesis and (2) the current depth is 0.

No regexes. No recursion. A single pass through the input string without any intermediate lists.

tests = [
    "Hi this is a test ( a b ( c d) e) sentence",
    "(Data with in (Boo) And good luck)",
]

delta = {
    '(': 1,
    ')': -1,
}

def remove_paren_groups(input):
    depth = 0

    for c in input:
        d = delta.get(c, 0)
        depth += d
        if d != 0 or depth > 0:
            continue
        yield c

for input in tests:
    print ' IN: %s' % repr(input)
    print 'OUT: %s' % repr(''.join(remove_paren_groups(input)))

Output:

 IN: 'Hi this is a test ( a b ( c d) e) sentence'
OUT: 'Hi this is a test  sentence'
 IN: '(Data with in (Boo) And good luck)'
OUT: ''

Mark K · Accepted Answer · 2019-10-11 22:33:36Z

0

Referenced from here

import re
item = "example (.com) w3resource github (.com) stackoverflow (.com)"

### Add lines in case there are non-ascii problem:
# -*- coding: utf-8 -*-
item = item .decode('ascii', errors = 'ignore').encode()

print re.sub(r" ?\([^)]+\)", "", item)

answered Oct 11, 2019 at 22:33

Mark K

9,50615 gold badges70 silver badges133 bronze badges

Collectives™ on Stack Overflow

How can I remove text within multi layer of parentheses python [duplicate]

5 Answers 5

1 Comment

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

1 Comment

Comments

Comments

Linked

Related