0

i'm struggling to find an answer for the following... i suspect I don't really know what i'm asking for or how to ask it... let me describe:

I would like to grab some links from a page. I only want the links that have the following word as part of the URL: "advertid". Therefore and for example, the URL would be something like http://thisisanadvertis.com/questions/ask.

I've got this far

                <?php
// This is our starting point. Change this to whatever URL you want.
$start = "https://example.com";

function follow_links($url) {
    // Create a new instance of PHP's DOMDocument class.
    $doc = new DOMDocument();
    // Use file_get_contents() to download the page, pass the output of file_get_contents()
    // to PHP's DOMDocument class.
    @$doc->loadHTML(@file_get_contents($url));
    // Create an array of all of the links we find on the page. 
    $linklist = $doc->getElementsByTagName("a");
    // Loop through all of the links we find.
    foreach ($linklist as $link) {
        echo $link->getAttribute("href")."\n";
    }
}
// Begin the crawling process by crawling the starting link first.
follow_links($start);
        ?>

This returns all URLs on the page... which is OK. So to try and get the URLs i wanted, i tried a few things including trying to amend the getattribute part:

echo $link->getAttribute("href"."*advertid*")."\n";

I've tried a few things... but can't get what i want. Can someone point me in the right direction, i'm a bit stuck.

Many thanks in advance.

2
  • You want to filter some urls? Commented Oct 11, 2018 at 19:59
  • Basically -- yes Commented Oct 11, 2018 at 20:01

4 Answers 4

1
foreach ($linklist as $link) {
   if (strpos($link->getAttribute("href"), 'advertid') !== false) {
       echo $link->getAttribute("href")."\n";
   }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Just beaten to it ;). Thank you for this.
this was kinda funny the moment i posted my answer i saw the other answer which were almost identical haha.
1

You can check if the href attribute has the info you want, with some logic, dependending on the case:

foreach ($linklist as $link) {
    if(strpos($link->getAttribute("href"), 'advertid') >= 0) {
        echo $link->getAttribute("href")."\n";
    }
}

Comments

1
$links = []
foreach ($linklist as $link) {
    $href = $link->getAttribute("href");
    if (preg_match('/.*advertid.*/', $href)) {
        array_push($links, $href);
    }
}

1 Comment

This is useful. I note the array_push function... i needed that also! Many thanks :)
0

I would suggest you to use PHP function strpos

strpos takes at least two parameter, the first is the string you're searching in. The second parameter is what you're looking for in the first string.

strpos returns the position of the string if it's found, or false if it's not found.

So your loop would look something like :

foreach ($linklist as $link) {
    if( strpos($link->getAttribute("href"), 'advertid') !== false ){
       echo $link->getAttribute("href")."\n";
    }
}

3 Comments

Many thanks Jimmy -- this worked. I've now put each $link into an array. Am I correct in thinking that the array should now only contain the URLs I want (i.e., those with 'advertid' in them? I've tried print_r ($array)... but i can't see the URLs in the out put.
I think i'm populating my array incorrectly. I note zkempel below used the array_push function, which i will try :).
Like a charm :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.