0

I am attempting to extract a substring that contains numbers and letters:

string = "LINE     : 11m56.95s CPU    13m31.14s TODAY"

I only want 11m56.95s and 13m31.14s

I have tried doing this:

re.findall('\d+', string)

that doesn't give me what I want, I also tried this:

re.findall('\d{2}[m]+\d[.]+\d|\+)

that did not work either, any other suggestions?

0

5 Answers 5

4

Try this:

re.findall("[0-9]{2}[m][0-9]{2}\.[0-9]{2}[s]", string)

Output:

['11m56.95s', '13m31.14s']
Sign up to request clarification or add additional context in comments.

Comments

3

Your current regular expression does not match what you expect it to.

You could use the following regular expression to extract those substrings.

re.findall(r'\d+m\d+\.\d+s', string)

Live Demo

Example:

>>> import re
>>> s = 'LINE     : 11m56.95s CPU    13m31.14s TODAY'
>>> for x in re.findall(r'\d+m\d+\.\d+s', s):
...     print x

11m56.95s
13m31.14s

Comments

2

Your Regex pattern is not formed correctly. It is currently matching:

\d{2}  # Two digits
[m]+   # One or more m characters
\d     # A digit
[.]+   # One or more . characters
\d|\+  # A digit or +

Instead, you should use:

>>> import re
>>> string = "LINE     : 11m56.95s CPU    13m31.14s TODAY"
>>> re.findall('\d+m\d+\.\d+s', string)
['11m56.95s', '13m31.14s']
>>>

Below is an explanation of what the new pattern matches:

\d+  # One or more digits
m    # m
\d+  # One or more digits
\.   # .
\d+  # One or more digits
s    # s

6 Comments

Thanks for the explanation makes much more sense, How do I print the results without having the brackets around the regular expression I am extracting?
@octain - The brackets mean that the output is a list. You could remove them by using str.join: print ', '.join(re.findall('\d+m\d+\.\d+s', string)). Of course, that is just an example; it depends on what output you want.
I want to write the output to a file i did something like this: l = re.sub(r'\d+m\d+\.\d+s', line) and wd0.write(str(l))
Well, re.sub is used to replace patterns in a string. You want to do something like: for match in re.findall('\d+m\d+\.\d+s', string): wd0.write(match + '\n')
I tried playing around with the other regex tools, I will give that a try thanks again!
|
2
\b   #word boundary
\d+  #starts with digit
.*?   #anything (non-greedy so its the smallest possible match)
s    #ends with s
\b   #word boundary

Comments

1

If your lines are all like your example split will work:

s = "LINE     : 11m56.95s CPU    13m31.14s TODAY"

spl = s.split()

a,b = spl[2],spl[4]
print(a,b)
('11m56.95s', '13m31.14s')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.