1

I wrote a regex to find out href from anchor tag
My regex is

<a.*?href="(.*?)">blah<\/a> //dot is matching all

So according to me, this will start matching from <a until it finds out first href. After this it will grab the url in href until first " and then it will match for blah.
But this is matching multiple sets of anchor tags which have blah tag in end, for example:

<a href="some_url">abc</a>
<a href="some_url1">def</a>
<a href="get_this">blah</a>

According to me it should grab only last url as regex fits it perfectly.

4
  • What do you mean by "perfectly"? You regexp matches the whole code because of the .*? part. Commented Nov 28, 2014 at 0:30
  • well that "perfectly" part is according to me and I'm definitely wrong. And isn't .*? should stop before it matches next character as it is non-greedy match? Commented Nov 28, 2014 at 0:32
  • It'll match the whole tag but the href is in first captured group $1 Commented Nov 28, 2014 at 0:32
  • Your regex works fine for me. Try putting your regex in here at: regex101.com . Put those three <a> tags in and you will see that it only matches the last line and captures "get_this". Commented Nov 28, 2014 at 0:37

1 Answer 1

2

To answer the question, you can swap your dot operator for a not group, to match everything but the closing tag:

<a[^>]*href="([^"]*)">def<\/a>

This (in theory) ensures that the regex pattern will only match inside a particular tag.

To not answer your question: it's often not a great idea to parse HTML with regex, unless you can be extremely sure of exactly how it's formatted. You might want to look into the PHP DOM.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.