I have some input with a link and I want to open that link. For instance, I have an HTML file and want to find all links in the file and open their contents in an Excel spreadsheet.
-
Why oh why must each of your posts be formatted like that? Why?innaM– innaM2009-05-27 11:50:31 +00:00Commented May 27, 2009 at 11:50
-
1Are you asking how to get a list of links from some html file? Or are you asking how to follow the links? Or are you asking how to get something into an Excel spreadsheet?innaM– innaM2009-05-27 12:14:50 +00:00Commented May 27, 2009 at 12:14
-
The way I read it he/she wants to scrape data from pages that are linked from a given page and put the results in Excel documents.Chas. Owens– Chas. Owens2009-05-27 13:17:52 +00:00Commented May 27, 2009 at 13:17
-
i want to open the links and read its contents in a html file.User1611– User16112009-05-28 08:00:38 +00:00Commented May 28, 2009 at 8:00
4 Answers
It sounds like you want the linktractor script from my HTML::SimpleLinkExtor module.
You might also be interested in my webreaper script. I wrote that a long, long time ago to do something close to this same task. I don't really recommend it because other tools are much better now, but you can at least look at the code.
CPAN and Google are your friends. :)
Mojo::UserAgent is quite nice for this, too:
use Mojo::UserAgent
print Mojo::UserAgent
->new
->get( $ARGV[0] )
->res
->dom->find( "a" )
->map( attr => "href" )
->join( "\n" );
Comments
That sounds like a job for WWW::Mechanize. It provides a fairly high level interface to fetching and studying web pages.
Once you've read the docs, I think you'll have a good idea how to go about it.
2 Comments
There is also Web::Query:
#!/usr/bin/env perl
use 5.10.0;
use strict;
use warnings;
use Web::Query;
say for wq( shift )->find('a')->attr('href');
Or, from the cli:
$ perl -MWeb::Query -E'say for wq(shift)->find("a")->attr("href")' \
http://techblog.babyl.ca