0

I am trying to extract image src links using the following Perl code. Don't get where I am making mistake. 1. open a file and read URLs in it

My text file looks like this

https://zzzzzz.com/
https://yyyyyyy.com/
https://xxxxxxxxxx.com/
https://stackoverflow.com/
https://www.google.com/
https://www.yahoo.com/
  1. foreach URL in text file extracting img src
  2. print the retrieved data into another file
  3. again open the file using new file handle and read it into an array
  4. while dereferencing array it shows error ARRAY(0x2e14a48) ARRAY(0x3125528) ARRAY(0x312e170).

Perl code is

#!/usr/bin/perl
print "Content-type: text/html\n\n";
use strict;
use warnings;
use DBI;
use LWP::Simple;
use HTML::LinkExtor;

my $filename = "/path/to/file";

open FILE, '<', $filename or print "cant open file: $!";
my @data = <FILE>;
close(FILE);

my $image = "/path/to/file";

open FILES, '>', $image or print "cant write to file: $!";

foreach my $urls (@data) {
   my $url = get("$urls");

   my $linkextor = HTML::LinkExtor->new( \&links );

   $linkextor->parse($url);

   my $key;

   sub links {
      ( my $tag, my %links ) = @_;
      if ( $tag eq "img" ) {
         foreach my $key ( keys %links ) {
            if ( $key eq "src" ) {
               foreach my $da ( @{$links{$key}} ) {
                  if ( $da =~ /^[a-zA-Z]/ ) {
                     print FILES "$da;\n";
                  } #if
               } #foreach
            }    #if
         }    #foreach
      }    #if
   }    #sub

   print FILES "\n";

}    #foreach
close(FILES);

Until this, there is no problem I got all the src links like

https://zzzzzz.com/;https://yyyyyyy.com/;https://xxxxxxxxxx.com/;

https://zzzzzz.com/;https://yyyyyyy.com/;https://xxxxxxxxxx.com/;

https://zzzzzz.com/;https://yyyyyyy.com/;https://xxxxxxxxxx.com/;

https://zzzzzz.com/;https://yyyyyyy.com/;https://xxxxxxxxxx.com/;

This is the format I have output in the text file, all I need is to insert all these urls by order as $image1, $image2, $image3 in image column

my $platform = "mysql";
my $database = "xxx";
my $host     = "xxxxx";
my $port     = "xxxx";
my $user     = "xxxxx";
my $pw       = "xxxxxxxxx";

my $dbh = DBI->connect( "DBI:$platform:$database:$host:$port", $user, $pw );

open FILED, '<', $image or die "cannot open file: $!";
my @img = <FILED>;
close(FILED);

foreach my $lin (@img) {
   chomp $lin;
   my @in     = split ';', $lin;
   my $image1 = $in[0];
   my $image2 = $in[1];
   my $image3 = $in[2];

   print "$image1 $image2 $image3 \n";

   $sth->execute( $li, $val, $parsed, $htmls, $image1, $image2, $image3 );

}

exit;

I thought that I am making mistakes in foreach loop, am I right. Thanks in advance.

1
  • 2
    ... why do you have a sub definition embedded within your foreach loop? Commented Nov 1, 2017 at 10:45

2 Answers 2

1

Your problem is likely here:

foreach my $da ( $links{$key} ) {

Because it looks like you're assuming that $links{$key} is an array, when it cannot be - it can only be an array reference. And this will have the problem you described if you print it - it'll out put ARRAY(0xDEADBEEF) type format, because that's how an array ref stringifies.

So you might find that changing it to:

foreach my $da ( @{$links{$key}} ) {

Will do the trick.

But I'd also suggest

  • embedding a sub within a foreach loop is bad style.
  • Use 3 argument open with lexical file handles - e.g. open my $input, '<', 'file.name' or die $!.
  • iterate that with a while loop, rather than reading it into an array that you don't then reuse.
  • you declare my $key twice - the first instance isn't used, and is misleading.
  • You write your output to $image as FILES and then you open the same file and read it back in again. You don't seem to need the intermediate file though, so why not just stash it in the @img array in the first place?
Sign up to request clarification or add additional context in comments.

1 Comment

It's also worth noting that @data should be chomped, although I think LWP::Simple ignores trailing whitespace.
0

your problem lies here.

my @in     = split ';', $lin;
my $image1 = [0];
my $image2 = [1];
my $image3 = [2];

You are assigning the anonymous array to your variable. Above line should be like this.

my $image1 = $in[0];
my $image2 = $in[1];
my $image3 = $in[2];

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.