0

After iterating over a list with a for loop, in order to extract only a few values, I get this:

['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6']
['Dell Precision 380 (3.8 GHz, Pentium 4 processor 670)', '11.5']

What I need to do is to extract the information between the parenthesis in each line and put it into another list, but I'm struggling to find the right code.

I have tried the method described here: How do I find the string between two special characters?, but I get an error because the string is in a list.

I have also looked at the documentation for Re, but I'm not sure how to apply it in this case.

3
  • 1
    Use the pattern \((.+)\) to find everything between parantheses in a given string, in order to actually utilize it you'd use re.findall or compile the pattern and use pattern.findall Commented Oct 15, 2017 at 14:00
  • Great, this worked. Thanks! Commented Oct 15, 2017 at 14:07
  • 2
    Rather, use \((.+?)\) to get the shortest. Commented Oct 15, 2017 at 14:12

4 Answers 4

4

Considering that this a standard structure, you can avoid the regex part entirely, and simply do something like this:

Let us assume you have already extracted the string you want to work on:

s = 'Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)'

You can do a split on the first (, and then use slicing to remove what you don't need:

>>> s.split('(')[1][:-1]
'3.73 GHz, Pentium Exteme Edition 965'

While the above does have the dependency of the structure always falling between the parentheses, and in order to avoid the case of something raising, you can do:

s.partition('(')[2][:-1]

As provided in the comments by @JonClements.

Sign up to request clarification or add additional context in comments.

11 Comments

Thank you, that's also a nice one. I have actually used Regex and it does the whole job in one line, like this for x in y: r = re.findall(r'\((.+)\)', str(y[0])), so I don't need to extract the string first :)
@Edvard Haugland If there is a pure Python way I would suggest to use it in lieu of RegEx as it is magnitudinally faster, however it may not look as clean.
@EdvardHaugland Do keep in mind however, that the extraction will be O(1) because you know exactly where it is.
What about where there isn't any ()s ?
idjaw, for a larger data set, with 83 values, these are the results: With Regex: 11283 ms, with yours and @JonClements method: 8086 ms. I benchmarked them using PyCharm's built-in profiling utility. Thanks for the help :)
|
0
a = ['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6']
b = a[0] # Get 'Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)'
c = b[b.find('(') + 1: b.find(')')] # Get '3.73 GHz, Pentium Exteme Edition 965'

Comments

0

The "more powerful" way to achieve this is to use regex. Like this:

import re
regex = re.compile("\((.*)\)")
details = list(for regex.findall(text)[0] for text in origin_list if regex.search(text))

Comments

0

You can use r'\((.*)\) to get the data inside the parantesis. This is simple.

import re
data=[['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6'],
['Dell Precision 380 (3.8 GHz, Pentium 4 processor 670)', '11.5']]
result=[re.match(r'\((.*)\)',x[0]).group(1) for x in data]
print result

But simply using wildcard may sometime yield you junk results. So it is always better to apply more restrictions to get an exact match. Hence, if you use \w.*\((\d+.\d+\s\w.*,.*\d+)\) as your match pattern you will always get exact data. so if this is the case the same code will become

import re
data=[['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6'],
['Dell Precision 380 (3.8 GHz, Pentium 4 processor 670)', '11.5']]
result=[re.match(r'\w.*\((\d+.\d+\s\w.*,.*\d+)\)',x[0]).group(1) for x in data]
print result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.