2

I have a string like this:

data='WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage'

I need to get rid of everything until the first instance of the underline (inclusive) in regex.

I've tried this:

re.sub("(^.*\_),"", data)

but this get rids of everything before all underlines

ProcessCpuUsage

I need it to be:

jvmRuntimeModule_ProcessCpuUsag
2
  • 3
    You really don't even need to use regex for this. Commented Jan 6, 2015 at 20:29
  • Definitely don't use regex. It's much slower. Commented Jan 6, 2015 at 20:40

7 Answers 7

2

Use this instead:

from string import find

data='WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage'
result = data[find(data, "_")+1:]
print result
Sign up to request clarification or add additional context in comments.

Comments

1

re.sub("(^.*\_),"", data)

This makes . match every character in the line. Once it gets to the end, and can't match any more ".", it goes to the next token. Oops, that's a underscore! So, it backtracks back before the _ProcessCpuUsage, where it can match a underscore at the start, and then complete the match.

You should ask the . multiplier to be less greedy. You also do not need to capture the contents. Drop the parens. The backslash does nothing. Drop it. The leading line-start anchor also does nothing. Drop it.

re.sub(".*?_,", data)

Comments

1

You have become a victim of greedy matching. The expression matches the longest sequence that it possibly can.

I know there's a way to turn off greedy matching, but I never remember it. Instead there's a trick I use when there's a character I want to stop at. Instead of matching on every character with . I match on every character except the one I want to stop at.

re.sub("(^[^_]*\_", "", data)

Comments

1

This should do:

import re
def get_last_part(d):
    m = re.match('[^_]*_(.*)', d)
    if m:
        return m.group(1)
    else:
        return None

print get_last_part('WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage')

Comments

1

you can use str.index:

>>> data = 'WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage'
>>> data[data.index('_')+1:]
'jvmRuntimeModule_ProcessCpuUsage'

Using str.split

>>> data.split('_',1)[1]
'jvmRuntimeModule_ProcessCpuUsage'

Using str.find:

>>> data[data.find('_')+1:]
'jvmRuntimeModule_ProcessCpuUsage'

Take a look at string methods Here

Comments

1

Try this regex:

result = re.sub("^.*?_", "", text)

What the regex ^.*?_ does:

  • ^ .. Assert that the position is at the beginning of the string.
  • .*? .. Match every character that is not a linebreak character between zero and unlimitted times as few times as possible.
  • - .. Match the character _

Comments

1

Try using split():

s = 'WebSpherePMI_jvmRuntimeModule_ProcessCpuUsage'
print(s.split('_',1)[1])

Result:

jvmRuntimeModule_ProcessCpuUsage

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.