5

Suppose I'm searching for anchor links in a web page. A regex that works is:

 "\<a\s+.*?\>"

However, lets add a complication. Lets suppose that I only want links which surround specific text, for instance, the word 'next'. Normally, I would think all I had to do is:

 "\<a\s+.*?\>next"

But I find that now, if there are 3 anchor tags in a page, and the third one has 'next' after it, that the regex search finds a huge string extending from the first anchor tag, and extending to the third anchor tag. This makes sense if the period-asterisk-questionmark is finding all characters until it comes across ">next". But that is not what I want. I want to find all characters until it comes across ">", and then an additional constraint should be that right after the ">" there should be "next".

How do I get this to work?

1 Answer 1

6

You can fix your regex by prohibiting it from matching > inside the tag, i.e. by replacing . with [^>]:

"\<a\s+[^>]*?\>next"

.*? matches any number of characters. The fact that you made it reluctant does not make it stop at >: it continues matching past it, until it finds >next at the end. This is not greedy, because the expression matched as little as possible to obtain a match. It's just that no shorter matches were available.

Demo.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.