At first pass, I'd make a list of pre-compiled patterns to test against each line. The problem is likely to change and I want to add and delete patterns without disturbing the meat of the code:
my @patterns = (
qr/\A [A] [FG] [0-9]{5} \Z/x,
qr/\A [A-Z] [0-9]{4} \Z/x,
qr/\A [0-9]{4} [A-Z] \Z/x,
);
while( my $line = <DATA> ) {
next if grep { $line =~ $_ } @patterns;
print $line;
}
__END__
D0832
G2565
ZDS97
FHM2547
JDH1464
R2918
4918K
AG01023
AG02997
The big improvement isn't the patterns though. It's checking things one line at a time and printing the lines I want to keep. I don't have the entire file in memory at the same time; it's only a line at a time.
There's a problem with this though. It works, but it checks every pattern every time. That might not mean much if very few lines will ever match or there are only a few patterns. If you think it might matter, using first from List::Util instead of grep can help since it only needs to find one match and stops when it finds it:
use List::Util qw(first);
my @patterns = (
qr/\A [A] [FG] [0-9]{5} \Z/x,
qr/\A [A-Z] [0-9]{4} \Z/x,
qr/\A [0-9]{4} [A-Z] \Z/x,
);
while( my $line = <DATA> ) {
next if first { $line =~ $_ } @patterns;
print $line;
}
__END__
D0832
G2565
ZDS97
FHM2547
JDH1464
R2918
4918K
AG01023
AG02997
Or, I might make one giant pattern. Regexp::Assemble can put them together (but so can you if you watch out for the alternation precedence):
use v5.10;
use Regexp::Assemble;
my @patterns = (
'[A][FG][0-9]{5}',
'[A-Z][0-9]{4}',
'[0-9]{4}[A-Z]',
);
my $grand_pattern = do {
my $ra = Regexp::Assemble->new;
$ra->add( $_ ) for @patterns;
my $re = $ra->re;
qr/ \A (?: $re ) \Z /x;
};
say "Grand regex is $grand_pattern";
while( my $line = <DATA> ) {
next if $line =~ $grand_pattern;
print $line;
}
__END__
D0832
G2565
ZDS97
FHM2547
JDH1464
R2918
4918K
AG01023
AG02997
The next step would be to take the patterns from the command line or a configuration file, but that's not so hard. The program shouldn't know the patterns at all. You'll have a much easier time changing the patterns if you don't have to change the code.
\nat the end of your find regexes. About is it fast enough - it would definitely pass in a week timeframe, but you have to test and see if it satisfactory yourself :)[(0-9)]also matches(and). Similarly,[F|G]also matches|.grep -v 'regexp'will do the work better I think. See option-vin manual page of grep(1) utility. Grep is good on filtering lines of text. It has been developed with that target in mind. And is at least ten years older than perl.