0

I have some input with a link and I want to open that link. For instance, I have an HTML file and want to find all links in the file and open their contents in an Excel spreadsheet.

4
  • Why oh why must each of your posts be formatted like that? Why? Commented May 27, 2009 at 11:50
  • 1
    Are you asking how to get a list of links from some html file? Or are you asking how to follow the links? Or are you asking how to get something into an Excel spreadsheet? Commented May 27, 2009 at 12:14
  • The way I read it he/she wants to scrape data from pages that are linked from a given page and put the results in Excel documents. Commented May 27, 2009 at 13:17
  • i want to open the links and read its contents in a html file. Commented May 28, 2009 at 8:00

4 Answers 4

5

It sounds like you want the linktractor script from my HTML::SimpleLinkExtor module.

You might also be interested in my webreaper script. I wrote that a long, long time ago to do something close to this same task. I don't really recommend it because other tools are much better now, but you can at least look at the code.

CPAN and Google are your friends. :)

Mojo::UserAgent is quite nice for this, too:

use Mojo::UserAgent

print Mojo::UserAgent
    ->new
    ->get( $ARGV[0] )
    ->res
    ->dom->find( "a" )
    ->map( attr => "href" )
    ->join( "\n" );
Sign up to request clarification or add additional context in comments.

Comments

1

That sounds like a job for WWW::Mechanize. It provides a fairly high level interface to fetching and studying web pages.

Once you've read the docs, I think you'll have a good idea how to go about it.

2 Comments

use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( "google.com" ); print $mech->content; GETTING ERROR Error GETing google.com: Can't connect to www.google.com:80 (connect:Unknown error) I WANT TO KNOW WHAT IS WRONG.
google.com is special. It doesn't like robots. However, it sounds like you have a network issue if you can't even connect.
1

There is also Web::Query:

#!/usr/bin/env perl 

use 5.10.0;

use strict;
use warnings;

use Web::Query;

say for wq( shift )->find('a')->attr('href');

Or, from the cli:

$ perl -MWeb::Query -E'say for wq(shift)->find("a")->attr("href")' \
       http://techblog.babyl.ca

Comments

0

I've used URI::Find for this in the past (for when the file is not HTML).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.