1

kindly explain, why this issue comes
my data file

DATA----1
DATA----2
DATA----3
DATA----4
DATA----5
DATA----6
DATA----7
SAMPLE----1
SAMPLE----12
SAMPLE----13
SAMPLE----2
SAMPLE----3
SAMPLE----4
SAMPLE----5
OTHER----1
OTHER----2
OTHER----3  

where I need entire line which start with DATA and SAMPLE to an array and an another array should have content which start with SAMPLE end with two digit number

I have got output with following script

use strict;
use warnings;

open(FH, "di.txt");
my @file = <FH>;
close(FH);
my @arr2 = grep { $_ =~ m/^SAMPLE.+\d\d$/g } @file;  ## this array prints
my @arr1 = grep { $_ =~ m/^DATA|^SAMPLE/g } @file;

print @arr1,"\n\t~~~~~~~~~~~\n\n",@arr2;

First writen as

use strict;
use warnings;

open(FH, "di.txt");
my @file = <FH>;
close(FH);
my @arr1 = grep { $_ =~ m/^DATA|^SAMPLE/g } @file;   
my @arr2 = grep { $_ =~ m/^SAMPLE.+\d\d$/g } @file;  ## this doesn't print

print @arr1,"\n\t~~~~~~~~~~~\n\n",@arr2;

while run this one, prints only @arr1 what would be the reason @arr2 don't print

3
  • For me your both script is giving the same result. None of printing @arr2. Commented Aug 14, 2015 at 9:39
  • @Sant : Just remove the $ after \d, you will get desire output keeping array in any order. Commented Aug 14, 2015 at 9:44
  • Always use 3 arguments to open a file with lexical filehandle and with error handling. open $fh, "<", "di.txt" or die "Unable to open : $!"; Commented Aug 14, 2015 at 9:50

1 Answer 1

4

The problem is because of the behaviour of the global match /g option in scalar context

Every scalar variable has a marker that remembers where the most recent global match left off, and hence where the next one should start searching. It enables the use of the \G anchor in regex patterns, as well as while loops like this

my $s = 'aaabacad';

while ( $s =~ /a(.)/g ) {
  print "$1 ";
}

which prints

a b c d

In truth you're not interested in a global match in this case, you just want to discover whether OR NOT the pattern can be found in the string. The grep operator applies scalar context to its first parameter, so in using the /g option in this statement

my @arr1 = grep { $_ =~ m/^DATA|^SAMPLE/g } @file;

you have left every element of the @file with the marker set to right after DATA or SAMPLE. That means the next match on the same element m/^SAMPLE.+\d\d$/g will start looking from there and clearly can't even find the ^ anchor to the match fails

The pos function gives you access to the marker, and you can fix your original code by resetting it to the start of the string after the first grep call. If you write this instead

my @arr1 = grep { $_ =~ m/^DATA|^SAMPLE/g } @file;
pos($_) = 0 for @file;
my @arr2 = grep { $_ =~ m/^SAMPLE.+\d\d$/g } @file;  ## this doesn't print

then the output will be what you expected

The correct fix, however, is to write what you mean anyway, which means you should remove the /g option from the pattern matches. This code also works fine, and it's also more concise, more readable, and far less fragile

my @arr1 = grep /^DATA|^SAMPLE/, @file;
my @arr2 = grep /^SAMPLE.+\d\d$/, @file;
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Borodin. delightful answer. I learned new implementation of pos function "pos($_)".
@Sant: Like many other built-in operators, the default parameter for pos is $_ so just pos is the same as pos($_). But you can give it any scalar variable

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.