How can I extract numeric data from a text file?

Question

I want the Perl script to extract a data from a text file and save it as another text file. Each line of the text file contains an URL to a jpg like "http://pics1.riyaj.com/thumbs/000/082/104//small.jpg". I want the script to extract the last 6 numbers of each jpg URL, (i.e 082104) to a variable. I want the variable to be added to a different location on each line of the new text.

Input text:

text http://pics1.riyaj.com/thumbs/000/082/104/small.jpg text
text http://pics1.riyaj.com/thumbs/000/569/315/small.jpg text

Output text:

text php?id=82104 text
text php?id=569315 text

Thanks

Is their any specific problem you have with writing it yourself? Also, are the URLs always of the form you described or do you need to be able to handle arbitrary URLs? — mweerden
– mweerden, Commented Dec 12, 2008 at 7:37

brian d foy · Accepted Answer · 2008-12-12 07:40:01Z

2

What have you tried so far?

Here's a short program that gives you the meat of the problem, and you can add the rest of it:

while(  )
    {
    s|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|;
    print;
    }

This is very close to the command-line program the handles the looping and printing for you with the -p switch (see the perlrun documentation for the details):

perl -pi.old -e 's|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|' inputfile > outputfile

answered Dec 12, 2008 at 7:40

brian d foy

134k31 gold badges214 silver badges613 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Axeman · Accepted Answer · 2008-12-12 08:04:12Z

I didn't know whether to answer according to what you described ("last 6 digits") or just assume that it all fits the pattern you showed. So I decided to answer both ways.

Here is a method that can handle lines more diverse than your examples.

use FileHandle;

my $jpeg_RE = qr{
    (.*?)           # Anything, watching out for patterns ahead
    \s+             # At least one space
    (?> http:// )   # Once we match "http://" we're onto the next section
    \S*?            # Any non-space, watching out for what follows
    ( (?: \d+ / )*  # At least one digit, followed by a slash, any number of times
      \d+           # another group of digits
    )               # end group
    \D*?            # Any number of non-digits looking ahead
    \.jpg           # literal string '.jpg'
    \s+             # At least one space
   (.*)             # The rest of the line
}x;

my $infile  = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );

while ( my $line = <$infile> ) { 
    my ( $pre_text, $digits, $post_text ) = ( $line =~ m/$jpeg_RE/ );
    $digits        =~ s/\D//g;
    $outfile->printf( "$pre_text php?id=%s $post_text\n", substr( $digits, -6 ));
}
$infile->close();

However, if it's just as regular as you show, it gets a lot easier:

use FileHandle;
my $jpeg_RE = qr{
    (?> \Qhttp://pics1.riyaj.com/thumbs/\E ) 
    \d{3}
    /
    ( \d{3} )
    / 
    ( \d{3} )
    \S*?
    \.jpg
}x;

my $infile  = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );

while ( my $line = <$infile> ) { 
    $line =~ s/$jpeg_RE/php?id=$1$2/g;
    $outfile->print( $line );
}
$infile->close();

Collectives™ on Stack Overflow

How can I extract numeric data from a text file?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related