0

suppose I have this string:

some striinnngggg <a href="something/some_number">linkk</a> soooo <a href="someotherthing/not_number">asdfsadf</a>

I want to strip tags from this string that contains the tag format <a href="something/some_number"></a> without stripping the content of that tag where some_number can be any number

Hence in the example above, the desired end results is

some striinnngggg linkk soooo <a href="someotherthing/not_number">asdfsadf</a>

notice that the second tag did not get stripped since the second part of the link is not a number

how would I accomplish this using regex/php's preg functions

1

2 Answers 2

2

Detecting such tags with a regex is quite complicated since the order of the attributes can change, values can be delimited with double quotes, simple quotes, or none.

I think a easier way to do this is using DOMDocument to find matching tags:

$dom = new DOMDocument;
$dom->loadHTML($html);

$links = $dom->getElementsByTagName('a');

foreach ($links as $link) {
  if (preg_match("/[a-zA-Z0-9]+\/[0-9]+/", $link->getAttribute('href'))) {
    echo $link->nodeValue; // do whatever you need to do with the string here
  }
}
Sign up to request clarification or add additional context in comments.

Comments

1

Expression:

(<a.+?href=".*?\d.*?".*?>)(.+?)(</a>)

Find that, and replace with the second token (depending on your language it might be $2 or \1 or \2), which is just the link text.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.