How to remove text within parentheses from Python string?

Question

I am trying to remove parentheses and the text that resides in these parentheses, as well as hyphen characters. Some string examples look like the following:
example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens

I would like the results to be:

example = 'Year 1.2 Q4.1'  
example2 = 'Year 2-7 Q4.8'

How can I remove text residing within or following parentheses and special characters? I could only find str.strip() method. I am new to Python, so any feedback is greatly appreciated!

There are many ways. You should have a look at doing it with regex. I tagged it with regex and soon the regex sharks will be here. — Anton vBR
– Anton vBR, Commented Dec 27, 2017 at 19:33
Possible duplicate of Python: Split string by list of separators — splash58
– splash58, Commented Dec 27, 2017 at 19:34

Moinuddin Quadri · Accepted Answer · 2018-01-01 14:21:58Z

6

You may use below regex to get the desired result:

"\(.*\)|\s-\s.*"
#   ^     ^  Pattern 2: everything followed by space, '-' hyphen, space
#   ^   Pattern 1: everything within brackets (....)

Sample run:

>>> import re
>>> my_regex = "\(.*\)|\s-\s.*"

>>> example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
>>> example2 = 'Year 2-7 Q4.8 - Data markets and phases'

>>> re.sub(my_regex, "", example)
'Year 1.2 Q4.1'
>>> re.sub(my_regex, "", example2)
'Year 2-7 Q4.8'

Here I am using re.sub(pattern, repl, string, ...) which as the document says:

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed.

edited Jan 1, 2018 at 14:21

answered Dec 27, 2017 at 19:41

Moinuddin Quadri

48.4k13 gold badges101 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

theSekyi · Accepted Answer · 2017-12-27 20:09:32Z

1

We can do this using a * and a throwaway variable.

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
display,*_ = example.split('(')
print(display)

example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens
part_1,part_2,*_ = example2.split('-')
display = part_1 + '-'+ part_2
print(display)

answered Dec 27, 2017 at 20:09

theSekyi

5503 gold badges9 silver badges24 bronze badges

Comments

Aaditya Ura · Accepted Answer · 2017-12-28 14:21:18Z

1

You can try something like this , you will need little data cleaning after you fetch result to make it as your desired output:

import re
data=[]
pattern=r'\(.+\)|\s\-.+'
with open('file.txt','r') as f:
    for line in f:
        match=re.search(pattern,line)
        data.append(line.replace(match.group(),'').strip())

print(data)

answered Dec 28, 2017 at 14:21

Aaditya Ura

12.8k7 gold badges60 silver badges96 bronze badges

Comments

Anton vBR · Accepted Answer · 2017-12-27 19:45:29Z

0

Here is an example without regex (just to show you have good regex can be):

The code adds strings until a string starts with Q:

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'

def clean_string(s):
    for item in s.split():
        yield item
        if item.startswith('Q'):
            break

print(' '.join(clean_string(example)))

answered Dec 27, 2017 at 19:45

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

Collectives™ on Stack Overflow

How to remove text within parentheses from Python string?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related