Extract text between characters in a list in Python

Question

After iterating over a list with a for loop, in order to extract only a few values, I get this:

['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6']
['Dell Precision 380 (3.8 GHz, Pentium 4 processor 670)', '11.5']

What I need to do is to extract the information between the parenthesis in each line and put it into another list, but I'm struggling to find the right code.

I have tried the method described here: How do I find the string between two special characters?, but I get an error because the string is in a list.

I have also looked at the documentation for Re, but I'm not sure how to apply it in this case.

Use the pattern \((.+)\) to find everything between parantheses in a given string, in order to actually utilize it you'd use re.findall or compile the pattern and use pattern.findall — uspectaculum
– uspectaculum, Commented Oct 15, 2017 at 14:00

idjaw · Accepted Answer · 2017-10-15 14:17:32Z

4

Considering that this a standard structure, you can avoid the regex part entirely, and simply do something like this:

Let us assume you have already extracted the string you want to work on:

s = 'Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)'

You can do a split on the first (, and then use slicing to remove what you don't need:

>>> s.split('(')[1][:-1]
'3.73 GHz, Pentium Exteme Edition 965'

While the above does have the dependency of the structure always falling between the parentheses, and in order to avoid the case of something raising, you can do:

s.partition('(')[2][:-1]

As provided in the comments by @JonClements.

edited Oct 15, 2017 at 14:17

answered Oct 15, 2017 at 14:07

idjaw

26.8k10 gold badges68 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Edvard Haugland Over a year ago

Thank you, that's also a nice one. I have actually used Regex and it does the whole job in one line, like this for x in y: r = re.findall(r'\((.+)\)', str(y[0])), so I don't need to extract the string first :)

uspectaculum Over a year ago

@Edvard Haugland If there is a pure Python way I would suggest to use it in lieu of RegEx as it is magnitudinally faster, however it may not look as clean.

idjaw Over a year ago

@EdvardHaugland Do keep in mind however, that the extraction will be O(1) because you know exactly where it is.

Jon Clements Over a year ago

What about where there isn't any ()s ?

Edvard Haugland Over a year ago

idjaw, for a larger data set, with 83 values, these are the results: With Regex: 11283 ms, with yours and @JonClements method: 8086 ms. I benchmarked them using PyCharm's built-in profiling utility. Thanks for the help :)

|

Daniel Trugman · Accepted Answer · 2017-10-15 14:18:38Z

0

a = ['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6']
b = a[0] # Get 'Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)'
c = b[b.find('(') + 1: b.find(')')] # Get '3.73 GHz, Pentium Exteme Edition 965'

answered Oct 15, 2017 at 14:18

Daniel Trugman

8,52424 silver badges43 bronze badges

Comments

Troy Liu · Accepted Answer · 2017-10-15 14:18:43Z

0

The "more powerful" way to achieve this is to use regex. Like this:

import re
regex = re.compile("\((.*)\)")
details = list(for regex.findall(text)[0] for text in origin_list if regex.search(text))

answered Oct 15, 2017 at 14:18

Troy Liu

814 bronze badges

Comments

Mani · Accepted Answer · 2017-10-15 14:36:24Z

You can use r'\((.*)\) to get the data inside the parantesis. This is simple.

import re
data=[['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6'],
['Dell Precision 380 (3.8 GHz, Pentium 4 processor 670)', '11.5']]
result=[re.match(r'\((.*)\)',x[0]).group(1) for x in data]
print result

But simply using wildcard may sometime yield you junk results. So it is always better to apply more restrictions to get an exact match. Hence, if you use \w.*\((\d+.\d+\s\w.*,.*\d+)\) as your match pattern you will always get exact data. so if this is the case the same code will become

import re
data=[['Dell Precision 380 (3.73 GHz, Pentium Exteme Edition 965)', '11.6'],
['Dell Precision 380 (3.8 GHz, Pentium 4 processor 670)', '11.5']]
result=[re.match(r'\w.*\((\d+.\d+\s\w.*,.*\d+)\)',x[0]).group(1) for x in data]
print result

Collectives™ on Stack Overflow

Extract text between characters in a list in Python

4 Answers 4

11 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

11 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related