2

I have a text file like -

{[a] abc (b(c)d)}

I want to remove the content between these bracket [] and (()). so the output should be -

 abc

I removed the content between parentheses but could not remove the content between this [] I have tried below code -

import re

with open('data.txt') as f:
    input = f.read()
    line = input.replace("{","")
    line = line.replace("}","")
    output = re.sub(r'\(.*\)', "", line)
    print output

The output is -

[a] abc

In my code first I replace the {} and then remove the content from () . I want to add \[.*\] in output = re.sub(r'\(.*\)', "", line) this line . But could not find a way to do this. I am still learning python. So I am facing this problem. please help.

8
  • 1
    You did \(.*\). You could do \[.*\] too. Commented Apr 19, 2018 at 8:15
  • 2
    Just a remark not directly related to this: python regexes are not really good at processing balanced bracketted expressions... Commented Apr 19, 2018 at 8:17
  • @khelwood i have edited my question. Commented Apr 19, 2018 at 8:21
  • 1
    @Jan: I know about it, but AFAIK the standard library only contains the old re module and OP has an import re line... Commented Apr 19, 2018 at 8:26
  • 1
    All your replacements could be shortened to re.sub(r'[{}]|\(.*\)|\[.*\]', "", line) Commented Apr 19, 2018 at 8:48

3 Answers 3

4

Imo not as easy as it first might look, you'd very likely need some balanced (recursive) approach which could be achieved with the newer regex module:

import regex as re

string = "some lorem ipsum {[a] abc (b(c)d)} some other lorem ipsum {defg}"

rx_part = re.compile(r'{(.*?)}')
rx_nested_parentheses = re.compile(r'\((?:[^()]*|(?R))*\)')
rx_nested_brackets = re.compile(r'\[(?:[^\[\]]*|(?R))*\]')

for match in rx_part.finditer(string):
    part = rx_nested_brackets.sub('', 
        rx_nested_parentheses.sub('', 
            match.group(1))).strip()
    print(part)

Which would yield

abc
defg


The pattern is

\(         # opening parenthesis
(?:        # non.capturing group
    [^()]* # not ( nor )
    |      # or
    (?R)   # repeat the pattern
)*
\)
Sign up to request clarification or add additional context in comments.

3 Comments

Your answer is indeed correct and informative. I simply wonder whether it is required here. OP did not say what should be done with unbalanced expression. +1 anyway for the recursive regex example...
@SergeBallesta: Thanks, let's wait and see what OP really wants.
@Jan Thank you :D
2

You may check if a string contains [, ], (<no_parentheses_here>) or [no_brackets_here] substrings and remove them while there is a match.

import re                                    # Use standard re
s='{[a] abc (b(c)d)}'
rx = re.compile(r'\([^()]*\)|\[[^][]*]|[{}]')
while rx.search(s):                          # While regex matches the string
    s = rx.sub('', s)                        # Remove the matches
print(s.strip())                             # Strip whitespace and show the result
# => abc

See the Python demo

It will also work with paired nested (...) and [...], too.

Pattern details

  • \([^()]*\) - (, then any 0+ chars other than ( and ), and then )
  • | - or
  • \[[^][]*] - [, then any 0+ chars other than [ and ], and then ]
  • | - or
  • [{}] - a character class matching { or }.

4 Comments

Thanks for the explanation :) @Wiktor Stribiżew
@jahan You may also reduce whitespaces if you prepend the pattern with \s*: re.compile(r'\s*(?:\([^()]*\)|\[[^][]*]|[{}])')
I have another question. In my text file the first line is [abcdef] and the second line is {[a] abc (b(c)d)} . So when I use this regex it removes the first line and makes an empty space on the first line . Outputs are like in first line - 1. and in second line 2. abc. I used strip() and lsrtip() but could not remove the blank spaces which is created in the first line. How can I solve this problem? @WiktorStribiżew
@jahan Can you please create a code demo? I am not quite sure I understand what you mean. If you read the contents from a file into a string, that should not be a problem. If strip does not work, that may be not a whitespace at all, but some LTR or RTL marks, or other weird Unicode chars. Also, try re.sub(r'^\W+|\W+$', '', s)
1

i tried this and i got your desired output...i hope i got you right

import re

with open('aa.txt') as f:
    input = f.read()
    line = input.replace("{","")
    line = line.replace("}","")
    output = re.sub(r'\[.*\]', "", line)
    output = re.sub(r'\(.*\)', "", output)
    print(output)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.