Perl process and store unique values in array

Question

Below is the log file content and I'm reading the log file and grouping it based on the string - JIRA.

JIRA: COM-1234
Program:Development
Reviewer:John Wick 
Description:Genral fix
rev:r345676
------------------------------------------
JIRA:COM-1234
Program:Development
Reviewer:None
Description:Updating Received 
rev:r909276
------------------------------------------
JIRA: COM-6789
Program:Testing
Reviewer:Balise Mat
Description:Audited
rev:r876391
------------------------------------------
JIRA: COM-6789
Program:Testing
Reviewer:Chan Joe
Description:SO hwat 
rev:r698392
------------------------------------------
JIRA: COM-6789
Program:Testing
Reviewer:Chan Joe
Description:Paid the Due
rev:r327896
------------------------------------------

My requirement is , iterate thru every unique JIRA value - COM-1234 , COM-6789, etc and store the following or immediate details in to individual array like

(for COM-1234)

@prog = Development;
@Reviewer = John Wick;
@Description = Genral fix;
@rev = r345676;

(for COM-6789)

@prog = Testing;
@Reviewer = Balise Mat;
@Description = Audited;
@rev = r876391;

If the JIRA value is identical , say COM-1234 repeated 2 times and COM-6789 for 3 times , still push only the following or immediate details to the respective arrays. (i.e. values of the keys 'Program','Reviewer' ....)

(COM-1234)

@prog = Development;
@Reviewer = None;
@Description = Updating Received ;
@rev = r909276;

I'm very new to Perl and I can manage to reach only for the unique values and not sure how to push the following values to individual arrays. Any inputs will be really helpful. Thanks.

My incomplete code:

#!/usr/bin/perl
use warnings;
use Data::Dumper;

$/ = "%%%%";
open (AFILE, ""<", ""D:\\mine\\out.txt");
    while (<AFILE>)
    {
     @temp = split(/-{20,}/, $_);
    }
close (AFILE);

my %jiraHash;
for ($i=0; $i<=@temp; $i++) {
      if (($temp[$i] =~ /(((JIRA|SVN)\s{0,1}:(\s{0,2}[A-Za-z0-9-\s]{4,9}),
          {0,1}\s{0,2}){1,5})\nProgram\s{0,1}:\s{0,2}Development/) || 
          ($temp[$i] =~ /(((JIRA|SVN):(\s{0,2}[A-Za-z0-9-\s]{4,9}),
          {0,1}\s{0,2}){1,5})\nProgram\:\s{0,2}Testing/))    {

            $jiraId = $2;
            $jiraId =~ s/JIRA\s*\://;
            $temp[$i] =~ s/\w{3,}\s?:\s?//g;
            #print "==>$jiraId\n";
            $jiraHash{$jiraId}  = $temp[$i];

        } else {
            #print "NOT\n";
        }   
}
print Dumper(%jiraHash);

I'm planning display as HTML report in below format

Program: Development
FOR ID:  COM-1234

Revision    Reviewer    Comment
r345676     John Wick   Genral fix

Revision    Reviewer    Comment
r909276     None        Updating Received 

Program: Testing
FOR ID: COM-6789

Revision    Reviewer    Comment
r876391     Balise Mat  Audited

Revision    Reviewer    Comment
r698392     Chan Joe    SO hwat 

Revision    Reviewer    Comment
r327896     Chan Joe    Paid the Due

What are you going to do with this later on? Please use strict and fix your resulting errors. — simbabque
– simbabque, Commented Aug 18, 2016 at 11:22
Exactly what do you want to store for multiple records with the same ID? The first record? The last one? All of them in an array? — Ilmari Karonen
– Ilmari Karonen, Commented Aug 18, 2016 at 11:22
In the previous question you asked you had quite the answer for that. all you have to do is alter it a bit, but the structure of the code is approximately the same @Goku. — yoniyes
– yoniyes, Commented Aug 18, 2016 at 11:29

Dave Cross · Accepted Answer · 2016-08-18 12:44:58Z

4

It sounds like this data should be in a database.

But it is relatively simple to parse it into a data structure. Here, I've gone for a hash where the key is the Jira identifier and the value is a reference to an array that contains hash references. Each of the referenced hashes contains the details from one of the records.

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

use Data::Dumper;

my @records = do {
  local $/ = '------------------------------------------';
  <>;
};

chomp @records;

my %jira;

foreach (@records) {
  next unless /\S/;

  my %rec = /^(\w+):\s*(.+?)$/mg;
  push @{$jira{$rec{JIRA}}}, \%rec;
}

say Dumper \%jira;

When you run it on your given data, you get this output:

$VAR1 = {
          'COM-6789' => [
                          {
                            'Program' => 'Testing',
                            'JIRA' => 'COM-6789',
                            'rev' => 'r876391',
                            'Reviewer' => 'Balise Mat',
                            'Description' => 'Audited'
                          },
                          {
                            'Program' => 'Testing',
                            'JIRA' => 'COM-6789',
                            'rev' => 'r698392',
                            'Reviewer' => 'Chan Joe',
                            'Description' => 'SO hwat '
                          },
                          {
                            'Program' => 'Testing',
                            'JIRA' => 'COM-6789',
                            'rev' => 'r327896',
                            'Reviewer' => 'Chan Joe',
                            'Description' => 'Paid the Due'
                          }
                        ],
          'COM-1234' => [
                          {
                            'Program' => 'Development',
                            'JIRA' => 'COM-1234',
                            'rev' => 'r345676',
                            'Reviewer' => 'John Wick ',
                            'Description' => 'Genral fix'
                          },
                          {
                            'Program' => 'Development',
                            'JIRA' => 'COM-1234',
                            'rev' => 'r909276',
                            'Reviewer' => 'None',
                            'Description' => 'Updating Received '
                          }
                        ]
        };

From there, it's relatively simple to get a display of the data:

foreach my $j (keys %jira) {
  say "JIRA: $j";
  foreach (@{$jira{$j}}) {
    say "Program: $_->{Program}";
    say "Revision: $_->{rev}";
    # etc...
  }
}

edited Aug 18, 2016 at 12:44

answered Aug 18, 2016 at 12:22

Dave Cross

69.5k3 gold badges55 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

mkHun Over a year ago

I have seen you using use 5.010; instead of use 5.24 in most of your posts, why do you use this?

Dave Cross Over a year ago

I consider 5.10 to be the minimum version acceptable for modern Perl programming so I regularly use 5.10 features in my posts. If I have a need for more modern features then I'll require a higher version of Perl, but I try to keep to 5.10 in order to keep my answers accessible to a larger number of people.

mkHun Over a year ago

Thank you for your sharing. Afterwards I'll also keep your hint

Goku Over a year ago

@DaveCross : Thanks a lot for your input , worked perfectly .

Goku Over a year ago

@DaveCross : I'm now trying to get Hash of Hash of Arrays, like keep Development as my first hash and keep the JIRA ID as the 2nd hash and the associated values.Not getting the results. Could you pls. help me out here

|

Ilmari Karonen · Accepted Answer · 2016-08-18 12:13:46Z

Since your data is nicely structured into one line per data item, and since Perlby default processes input line by line, I suggest doing that instead of messing with $/ or regexps to split the input records. This does require you to remember the JIRA issue ID from the first line of each record, but that's simple — just store it in a variable declared outside the loop, like this:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);

my %records;
my $jiraID;
while (<>) {
    chomp;
    if (/^JIRA: (.*)/ and not defined $jiraID) {
        $jiraID = $1;
        $records{$jiraID} = {};  # wipe out any old data for this ID
    } elsif (/^(Program|Reviewer|Description|rev):(.*)/ and defined $jiraID) {
        $records{$jiraID}{$1} = $2;
    } elsif (/^-{20,}$/) {
        undef $jiraID;  # end of record
    } else {
        die qq(Unexpected input line "$_");
    }
}

print Dumper(\%records);

The code above reads its input from any file(s) provided as command line arguments, or from standard input if there aren't any, using the <> default input operator. If you want to read from a specific file handle that you've opened yourself, you can of course provide one.

Note that the code above stores only the last record for each ID. If you want to store all of them in an array, replace the line:

        $records{$jiraID} = {};  # wipe out any old data for this ID

with:

        push @{$records{$jiraID}}, {};  # start new record for this ID

and change the line:

        $records{$jiraID}{$1} = $2;

to:

        $records{$jiraID}[-1]{$1} = $2;

Ps. The regexps in the code above are based on your sample data. If your real data has other types of lines in it (or variations e.g. in the amount of whitespace), you'll need to adjust them to match those lines too. I coded the script to die if it sees anything unexpected, so it's easy to tell if that happens.

Update: Based on the sample output you posted while I was writing this answer, it looks like you want to group the data by both the JIRA and the Program lines. That's easy enough to do as well, e.g. like this:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);

my %records;
my $jiraID, $progID;
while (<>) {
    chomp;
    if (/^JIRA:\s*(.*)/ and not defined $jiraID) {
        $jiraID = $1;
    } elsif (/^Program:\s*(.*)/ and defined $jiraID and not defined $progID) {
        $progID = $1;
        push @{$records{$jiraID}{$progID}}, {};  # start new record for these IDs
    } elsif (/^(Reviewer|Description|rev):(.*)/ and defined $progID) {
        $records{$jiraID}{$progID}[-1]{$1} = $2;
    } elsif (/^-{20,}$/) {
        undef $jiraID, $progID;  # end of record
    } else {
        die qq(Unexpected input line "$_");
    }
}

print Dumper(\%records);

Note that I grouped the output data structure first by the JIRA ID and then by the program, but of course those would be easy to swap (or even combine into a single hash key, if you prefer).

@IImari Karonen : Really appreciate your inputs.Thanks for your time.

Hambone · Accepted Answer · 2016-08-18 11:47:49Z

1

This doesn't handle the final output, but this may be a simplified approach to storing lists of lists for each hash element (ticket id), along with some sample output at the end. It's not formatted the way you want, but that should be easy enough:

use strict;

my (%jira, @values, $ticket_id);

open my $IN, '<', 'jira.txt' or die;
while (<$IN>) {
  chomp;
  my ($key, $val) = split /:\s*/;

  if ($key eq 'JIRA') {
    if (@values) {
      push @{$jira{$ticket_id}}, [ @values ];
      @values = ();
    }
    $ticket_id = $val;

  } elsif ($key eq 'Program') {
    $values[0] = $val;
  } elsif ($key eq 'Reviewer') {
    $values[1] = $val;
  } elsif ($key eq 'Description') {
    $values[2] = $val;
  } elsif ($key eq 'rev') {
    $values[3] = $val;
  }
}
close $IN;

push @{$jira{$ticket_id}}, [ @values ];

while (my ($ticket, $ref) = each %jira) {
  print "$ticket =>\n";
  foreach my $line_ref (@$ref) {
    print join "\t", @$line_ref, "\n";
  }
}

Sample output:

COM-1234 =>
Development     John Wick       Genral fix      r345676
Development     None    Updating Received       r909276
COM-6789 =>
Testing Balise Mat      Audited r876391
Testing Chan Joe        SO hwat         r698392
Testing Chan Joe        Paid the Due    r327896

answered Aug 18, 2016 at 11:47

Hambone

16.5k8 gold badges54 silver badges79 bronze badges

1 Comment

Goku Over a year ago

Thanks much for an alternate solution :)

Collectives™ on Stack Overflow

Perl process and store unique values in array

3 Answers 3

11 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

11 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related