0

I want to fetch only the 'cleaner' version of the url without any parameters. IOW... if there is a question mark inside the url remove it and everything afterwards.

Here is my current line :

preg_match_all('/<a(.*?)href=("|\'|)(.*?)("|\'| )(.*?)>/s',$content,$ahref);

And just to be more clear here... I'm expecting that this url (for example):

/go/page/mobile_download_apps.html?&who=r,6GDewh28SCW3/fUSqmWqR_E9ljkcH1DheIMqgbiHjlX3OBDbskcuCZ22iDvk0zeZR7BEthcEaXGFWaQ4Burmd4eKuhMpqojjDE6BrCiUtLClkT32CejpMIdnqVOUmWBD

Would be :

/go/page/mobile_download_apps.html
7
  • 1
    Wouldn't this do the trick ? /(<a href=")(.*)(\?.*)/s (missing some info to give a more detailed answer...) Commented Jan 26, 2015 at 0:30
  • 1
    @Benoît Yes it would. But only in this example, not for other cases, e.g. when ? was absent. (And OP will just keep reasking those regex questions without trying to understand what they do.) Commented Jan 26, 2015 at 0:32
  • The best way to go is to get the url using some DOM parser, then use regex to remove that trailing part... ie: getting everything till the first ? => regex101.com/r/mD3sB1/1 Commented Jan 26, 2015 at 0:36
  • This is very easy to remove everything after the ? but I was asking if it's possible to do it on the fly directly via regex Commented Jan 26, 2015 at 0:39
  • @Enissay Completely remove it... (no need to capture) Commented Jan 26, 2015 at 0:51

4 Answers 4

5

With DOMDocument, strpos, substr:

$dom = new DOMDocument;
$dom->loadHTML($content);

$linkNodeList = $dom->getElementsByTagName('a');

foreach($linkNodeList as $linkNode) {
    $href = $linkNode->getAttribute('href');

    if ( false !== ($offset = strpos($href, '?')) )
        $linkNode->setAttribute('href', substr($href, 0, $offset));
}

$newContent = $dom->saveHTML();

or with explode:

$linkNode->setAttribute('href', explode('?', $href)[0]);
Sign up to request clarification or add additional context in comments.

Comments

0

Do you mean this behavior:

<a\s+href\s*=\s*"\K[^"?]+


$result = preg_replace('/<a\s+href\s*=\s*"\K[^"?]+/im', '', $text);

Comments

0

As mentioned in the comments, you shouldn't get the tag with regex, you should use a parser. Nevertheless, here you go:

<a[^>]+href=("|')([^"'?]*)[^"']*\1[^>]*>

Demo: https://regex101.com/r/tV5pP8/3

1 Comment

Backreferences [^\1] don't work in character classes.
-1

Opps... Lack of concentration from my side :)

Solved it by myself... (It was super easy)

Here is the final line :

preg_match_all('/<a(.*?)href=("|\'|)(.*?)(\?|"|\'| )(.*?)>/s',$content,$ahref);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.