Python regex extracting substrings containing numbers and letters

Question

I am attempting to extract a substring that contains numbers and letters:

string = "LINE     : 11m56.95s CPU    13m31.14s TODAY"

I only want 11m56.95s and 13m31.14s

I have tried doing this:

re.findall('\d+', string)

that doesn't give me what I want, I also tried this:

re.findall('\d{2}[m]+\d[.]+\d|\+)

that did not work either, any other suggestions?

heemayl · Accepted Answer · 2015-01-20 18:39:44Z

4

Try this:

re.findall("[0-9]{2}[m][0-9]{2}\.[0-9]{2}[s]", string)

Output:

['11m56.95s', '13m31.14s']

answered Jan 20, 2015 at 18:39

heemayl

42.5k10 gold badges86 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hwnd · Accepted Answer · 2015-01-21 01:03:16Z

3

Your current regular expression does not match what you expect it to.

You could use the following regular expression to extract those substrings.

re.findall(r'\d+m\d+\.\d+s', string)

Live Demo

Example:

>>> import re
>>> s = 'LINE     : 11m56.95s CPU    13m31.14s TODAY'
>>> for x in re.findall(r'\d+m\d+\.\d+s', s):
...     print x

11m56.95s
13m31.14s

edited Jan 21, 2015 at 1:03

answered Jan 20, 2015 at 18:33

hwnd

70.9k4 gold badges100 silver badges135 bronze badges

Comments

score 2 · Accepted Answer · 2015-01-20 18:40:17Z

2

Your Regex pattern is not formed correctly. It is currently matching:

\d{2}  # Two digits
[m]+   # One or more m characters
\d     # A digit
[.]+   # One or more . characters
\d|\+  # A digit or +

Instead, you should use:

>>> import re
>>> string = "LINE     : 11m56.95s CPU    13m31.14s TODAY"
>>> re.findall('\d+m\d+\.\d+s', string)
['11m56.95s', '13m31.14s']
>>>

Below is an explanation of what the new pattern matches:

\d+  # One or more digits
m    # m
\d+  # One or more digits
\.   # .
\d+  # One or more digits
s    # s

edited Jan 20, 2015 at 18:40

answered Jan 20, 2015 at 18:33

user2555451

6 Comments

octain Over a year ago

Thanks for the explanation makes much more sense, How do I print the results without having the brackets around the regular expression I am extracting?

user2555451 Over a year ago

@octain - The brackets mean that the output is a list. You could remove them by using str.join: print ', '.join(re.findall('\d+m\d+\.\d+s', string)). Of course, that is just an example; it depends on what output you want.

octain Over a year ago

I want to write the output to a file i did something like this: l = re.sub(r'\d+m\d+\.\d+s', line) and wd0.write(str(l))

user2555451 Over a year ago

Well, re.sub is used to replace patterns in a string. You want to do something like: for match in re.findall('\d+m\d+\.\d+s', string): wd0.write(match + '\n')

octain Over a year ago

I tried playing around with the other regex tools, I will give that a try thanks again!

|

Joran Beasley · Accepted Answer · 2015-01-20 19:08:03Z

2

\b   #word boundary
\d+  #starts with digit
.*?   #anything (non-greedy so its the smallest possible match)
s    #ends with s
\b   #word boundary

edited Jan 20, 2015 at 19:08

answered Jan 20, 2015 at 18:39

Joran Beasley

114k13 gold badges167 silver badges187 bronze badges

Comments

Padraic Cunningham · Accepted Answer · 2015-01-20 18:48:43Z

1

If your lines are all like your example split will work:

s = "LINE     : 11m56.95s CPU    13m31.14s TODAY"

spl = s.split()

a,b = spl[2],spl[4]
print(a,b)
('11m56.95s', '13m31.14s')

answered Jan 20, 2015 at 18:48

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Collectives™ on Stack Overflow

Python regex extracting substrings containing numbers and letters

5 Answers 5

Comments

Comments

6 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related