0

I met one problem when I use regex to match some string using Python.

Example string:

ln[1] --This is a string--

ln[2] Match the line below.

ln[3] --This is a string--

ln[4] Match this line start from here.

ln[5] -This is the end-

I want to extract abc in the string above.

code:

pattern = re.compile('%s(.*?)%s' % ('--This is a string--', '-This is the end-'))
re.findall(pattern, string)

How can I get the line 4 only, not get line 2 to line 4 ?

Thank you very much.

4
  • Your regex says .*? -- what is it that you intend? Commented Jul 10, 2013 at 10:08
  • 2
    To be fair, abc would work. Commented Jul 10, 2013 at 10:08
  • Regex engines work left-to-right, so your regex starts the match at the first a it encounters, and then keeps matching until the c is reached. If you don't want to allow more than one a, you need to tell the regex engine that. Commented Jul 10, 2013 at 10:13
  • I'd like to match only one a, how can I do it ? Commented Jul 10, 2013 at 10:57

3 Answers 3

2
>>> re.findall('a[^a]*c', 'aaaaaaaaabc')
['abc']
>>> re.findall('a[^a]*c', 'aaaaaaaaa c')
['a c']
Sign up to request clarification or add additional context in comments.

Comments

2

Probably, via this:

pattern = re.compile('.*(a.*?c)')
re.findall(pattern, string)  # yields ["abc"]

Comments

1

If you want to replace all instances of repeated characters you could use id or named groups.

Example:

with id:

>>> re.sub('(.)(\\1)+', '\\1', 'abcAAAAabcBBBBabcCCCCabc')
'abcAabcBabcCabc'

with name:

>>> re.sub('(?P<n>.)(?P=n)+', '\\1', 'abcAAAAabcBBBBabcCCCCabc')
'abcAabcBabcCabc'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.