1

Input:

OUT :abc123: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT :abc123 : : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/prodperim/installer/abc123.fw is older than it should be (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/prodperim/installer/abc123.fw.schedule is older than it should be (not updated for 36 hours)
OUT bcd111: : Succeeded.

I want to filter only hosts which has matched "Warnings".

Output:

abc123 
abc1234
bcd111

I have tried the below regex it matched all.

([\w]+)\s+:\s+:\s+Warning

Is it possible to avoid duplicates using regex?

1
  • Probably better to iterate over the lines and populate a hash. Commented Oct 13, 2014 at 12:20

5 Answers 5

3

When you hear "unique" in Perl, think "hash":

#!/usr/bin/perl
use warnings;
use strict;

my %uniq;
while (<>) {
    /:?(\S+?)[:\s]+Warning/ and $uniq{$1} = 1;
}

print "$_\n" for keys %uniq;

BTW, You input and regex don't lead to the output you indicated. I changed the regex, but I'm not sure your input sample is correct. Is the placement of colons really so wild?

Sign up to request clarification or add additional context in comments.

Comments

1
OUT\s*:?([^:]*):(?=.*?\bWarning\b)(?:(?!OUT).)*(?!.*?\1[:\s]*Warning)

You can try this.See demo.Grab the capture.

http://regex101.com/r/sK8oK9/12

Comments

0

You can use this perl one-liner:

perl -lane 'if (/\bWarning\b/) { @F[1] =~ s/(\W+)//g; print "@F[1]" }' file
abc123
abc123
abc1234
abc1234
abc1234
bcd111

Comments

0

use this pattern w/ gs option

OUT\s*:?([^:]+):\s*:\s*Warning(?!.*?\1\s*:\s*:\s*Warning)  

Demo

Comments

0

This is more of a supplement/complement to @choroba's response above since he nailed it with "when you hear 'unique' think 'hash'". You should accept @choroba's answer :-)

Here I simplified the regex part of your question into a call to grep in order to focus on uniqueness, changed the data in your file a bit (so it could fit here) and saved it as dups.log:

# dups.log 
OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: / filesystem 100% full
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Succeeded.

This one-liner give the output below:

perl -E '++$seen{$_} for grep{/Warning/} <>; print %seen' dups.log

OUT :abc123: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)
OUT abc1234: : Warning: / filesystem 100% full
OUT :abc123: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT bcd111: : Warning: /var/tmp/abc123.fw.sched old (not updated for 36 hours)
OUT abc1234: : Warning: /var/tmp/abc123.fw old (not updated for 36 hours)

This is pretty much the same output you'd get with uniq log_with_dups.log | grep Warning. It works because perl creates a hash key from each line it reads on STDIN adding a key to the hash and incrementing its value (with ++$seen{$_}) each time it sees the key. For perl "same key" here means a line that is a duplicate. Try printing values %seen or using -MDDP and p %seen to get a sense of what is going on.

To get your output @choroba's regex adds the capture (instead of the whole line) to the hash:

perl -nE '/:?(\S+?)[:\s]+Warning/ && ++$seen{$1} }{ say for keys %seen' dups.log


but, just as with the whole line method above, the regex will create only one copy of the key (from the match and capture) and then increment it with ++ so in the you get "unique" keys à la uniq in the %seen hash.


It's a neat perl trick you never forget :-)

References:

  • The SO question has some good explanations of the perl idiom for uniq using a hash as per @choroba.
  • This is touched on in perlfaq4 which describes the %seen{} hash trick.
  • Perlmaven shows how to make your own "home made" uniq using this approach.
  • ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.