1

Here is a sample block of code I need to scrape:

<p>This paragraph contains <a href="http://twitter.com/chsweb" data-placement="below" rel="twipsy" target="_blank" data-original-title="Twitter">links to Twitter folks</a>, and <a href="http://twitter.com/blogcycle" data-placement="below" rel="twipsy" target="_blank" data-original-title="Twitter">more links to other Twitter folks</a>, but it also contains <a href="http://www.someOtherWebsiteHere.com">non-Twitter links too</a>.  How can I list only the Twitter links below?</p>

This script generates a list of every URL on the page:

<script>
var allLinks = document.links;
for (var i=0; i<allLinks.length; i++) {
  document.write(allLinks[i].href+"<BR/>");
}
</script>

How do I modify the script so that it only lists URLs that contain a certain domain, e.g.; twitter.com/?

Here is a demo page: http://chsweb.me/OucTum

1
  • 1
    Beware of document.write when you loop DOM node collections, the loop will never get past the first node. Commented Aug 29, 2012 at 13:41

4 Answers 4

1

On modern browser you could easily retrieve all desired links with

var twitter_links = document.querySelectorAll('a[href*="twitter.com"]');

using .querySelectorAll() is a bit penalizing in terms of speed, but probably you won't notice any significative difference and it will make code easier to read and shorter than using a for loop with a regular expression.

Sign up to request clarification or add additional context in comments.

2 Comments

Works beautifully! Here is the working demo: chsweb.me/NCeU6L Thank you Fabrizio.
Also, as a follow up, this is only used locally for me to organize links to Twitter folks who commented on a presentation I did. I am using it to thank them and send goodies; it is not a real life use case, but great that it can be done if needed.
0

The following will place all Twitter links in the twitter_links array:

var twitter_links = [ ],
    links = document.getElementsByTagName('a');
for(var i in links)
{
    if(/twitter.com/i.exec(links[i].href))
    {
        twitter_links.push(links[i]);
    }
}

Here's a jsFiddle for you > http://jsfiddle.net/Pv8DH/

1 Comment

Confirmed to work - puts links into an alert where they can be copied if needed. Thank you.
0

You can use window.location properties on the link element to extract different parts of the href. f.ex:

var link = allLinks[i];
if ( /twitter\.com/.test( link.hostname ) ) {
    document.write(link.href+"<BR/>");
}

Another issue with your code: If you use document.write in a for loop, it will effectively empty the collection of links, since they are just a reference to the links present in the current document. So it will never get past the first link. Collect them in an array instead:

var links = [];
for (var i=0; i<allLinks.length; i++) {
    var link = allLinks[i];
    if ( /twitter\.com/.test( link.hostname ) ) {
        links.push(link.href);
    }
}

document.write(links.join('<br>'));

Demo: http://jsfiddle.net/3xub6/

1 Comment

OK, confirmed, this works too. Thank you! There are many similar posts on SO, I hope this helps those folks too. chsweb.me/O2Wtd8
0

ORIGINAL: Not working on demo page (Sample 6)

<script>
if (allLinks[i].href.match("twitter\.com"))
{
     document.write(allLinks[i].href+"<BR/>");
}
</script>

REVISED: Is working on demo page (Sample 7)

<script>
var allLinks = document.links;
for (var i=0; i<allLinks.length; i++) {
      if (allLinks[i].href.match("twitter.com")) {
            document.write(allLinks[i].href+"<BR/>");
      }
}
</script> 

3 Comments

Hmmm, does not seem to work, here is a demo page that uses the script above for reference: chsweb.me/PO2v4o
Make a jsfiddler out of it instead of trying to host it from Dropbox.
Here is another take on your approach which does work on the demo page, here: chsweb.me/NWHuWi

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.