0

I am trying to remove parentheses and the text that resides in these parentheses, as well as hyphen characters. Some string examples look like the following:
example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens

I would like the results to be:

example = 'Year 1.2 Q4.1'  
example2 = 'Year 2-7 Q4.8'  

How can I remove text residing within or following parentheses and special characters? I could only find str.strip() method. I am new to Python, so any feedback is greatly appreciated!

3
  • 2
    There are many ways. You should have a look at doing it with regex. I tagged it with regex and soon the regex sharks will be here. Commented Dec 27, 2017 at 19:33
  • 1
    Possible duplicate of Python: Split string by list of separators Commented Dec 27, 2017 at 19:34
  • 1
    @AntonvBR lol. The regex sharks are circling the waters Commented Dec 27, 2017 at 19:39

4 Answers 4

6

You may use below regex to get the desired result:

"\(.*\)|\s-\s.*"
#   ^     ^  Pattern 2: everything followed by space, '-' hyphen, space
#   ^   Pattern 1: everything within brackets (....)

Sample run:

>>> import re
>>> my_regex = "\(.*\)|\s-\s.*"

>>> example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
>>> example2 = 'Year 2-7 Q4.8 - Data markets and phases'

>>> re.sub(my_regex, "", example)
'Year 1.2 Q4.1'
>>> re.sub(my_regex, "", example2)
'Year 2-7 Q4.8'

Here I am using re.sub(pattern, repl, string, ...) which as the document says:

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed.

Sign up to request clarification or add additional context in comments.

Comments

1

We can do this using a * and a throwaway variable.

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'
display,*_ = example.split('(')
print(display)

example2 = 'Year 2-7 Q4.8 - Data markets and phases' ##there are two hyphens
part_1,part_2,*_ = example2.split('-')
display = part_1 + '-'+ part_2
print(display)

Comments

1

You can try something like this , you will need little data cleaning after you fetch result to make it as your desired output:

import re
data=[]
pattern=r'\(.+\)|\s\-.+'
with open('file.txt','r') as f:
    for line in f:
        match=re.search(pattern,line)
        data.append(line.replace(match.group(),'').strip())

print(data)

Comments

0

Here is an example without regex (just to show you have good regex can be):

The code adds strings until a string starts with Q:

example = 'Year 1.2 Q4.1 (Section 1.5 Report (#222))'

def clean_string(s):
    for item in s.split():
        yield item
        if item.startswith('Q'):
            break

print(' '.join(clean_string(example)))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.