2

The content variable contains multiline string:

content = """
/blog/1:text:Lorem ipsum dolor sit amet, consectetur ### don't need this
<break>
text:Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
<break>
text:Excepteur sint occaecat cupidatat non proident.

/blog/16:text:Other Lorem ipsum dolor ### SEEKING THIS!!!
<break>
text:Other, really other
<break>
text:Blah blah.
"""

I'm trying to find the desired occurrence with the pattern /blog/16:

re.findall('^(?ism)%s?:(.*?)(\n\n)' % '/blog/16', content)

and expecting to get this

[(u'/blog/16:text:Other Lorem ipsum dolor ### SEEKING THIS!!!
<break>
text:Other, really other
<break>
text:Blah blah.', u'\n\n')]

but getting wrong result (/blog/1)

[(u'/blog/1:text:Lorem ipsum dolor sit amet, consectetur ### don't need this
<break>
text:Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
<break>
text:Excepteur sint occaecat cupidatat non proident.', u'\n\n')]

What is my mistake?

4
  • It is not clear. What is the pattern you are looking for and what is the problem? Commented Apr 26, 2014 at 7:00
  • What is my mistake?, Ans: Your mistake is, you didn't posted sample pattern you want to match. Commented Apr 26, 2014 at 7:02
  • Sorry for that, I'm looking for /blog/16, but it finds /blog/1. Updated the question. Commented Apr 26, 2014 at 7:03
  • cant understand what are you expecting on the output Commented Apr 26, 2014 at 7:04

3 Answers 3

2

Once you insert the blog text, this part of your regex:

/blog/16?:

Means "match: /blog/1 literally; then 6 literally (zero or one times); then : literally". Instead, try:

(?ism)^/blog/16:(.*?)$

This finds all of /blog/16: literally at the start of the line, then does a non-greedy search for any characters up to the end of a line (i.e. captures the rest of the text on the line).

You might find regex101 useful for developing and testing regular expressions.

Sign up to request clarification or add additional context in comments.

Comments

2

I think you forgot to put the non-capturing group in parentheses. The ?:. Right now, your ? says "0 or 1 of the previous element," which means that the 6 is unnecessary.

1 Comment

Thank you, I thought it related to the whole pattern, not to the last character.
2

When the String replacement is done, your string looks like this

^(?ism)/blog/16?:(.*?)(\n\n)

Here, ? means that match the previous pattern 0 or 1 times. So, when the input is /blog/1, it matches 0 times and allows the match.

The actual RegEx you are looking for is,

import re
print re.findall('(?ims)(/blog/16:.*)(?:/blog|$)', content)

Output

['/blog/16:text:Other Lorem ipsum dolor ### SEEKING THIS!!!\n<break>\ntext:Other, really other\n<break>\ntext:Blah blah.\n']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.