0

I have an array with the following elements:

my @array = ("\"Foo in Bar\" on Mon 09 Feb 2015 08:07:44 AM PST",
"\"Foo in Bar\" on Mon 09 Feb 2015 08:07:47 AM MST",
"\"Foo in Bar\" on Mon 09 Feb 2015 08:07:49 AM MST",
"\"Apple in Pie\" on Mon 09 Feb 2015 10:22:32 AM MST",
"\"Foo in Bar\" on Mon 09 Feb 2015 08:07:51 AM MST",
"\"Rock in Out\" on Mon 09 Feb 2015 11:17:41 AM PST") 

I want to sort this array so that all elements with a repeated string (inside the "") will be removed. The reason why this is a little unique is because the time associated with each string is a little different, but not much.

Here is what I want the output to look like:

"\"Foo in Bar\" on Mon 09 Feb 2015 08:07:49 AM MST",
"\"Apple in Pie\" on Mon 09 Feb 2015 10:22:32 AM MST",
"\"Rock in Out\" on Mon 09 Feb 2015 11:17:41 AM PST"

I don't really care about sorting the time, just removing the repeats inside the "".

This was my thought process so far:

    my @row;
    foreach my $row (@array) {
        my $name = $row;
        $name =~ s/\son.*//;
        next if (grep {$_ =~ /($name)/} @row);
        push(@row,$row);
    }

There has to be a better way to do this. Also, I am having issues with my method (the grep doesn't seem to be working as intended, it won't go to the next statement).

1
  • 3
    Using a hash to check for duplicates is the idiomatic way to do it. Commented Feb 9, 2015 at 20:09

2 Answers 2

5

The following assigns a list without duplicates to @filtered:

my %seen;
my @filtered = grep { !$seen{$_}++ } @array;

In your case, a minor tweek is needed. The substring between quotes is what determines if you've already seen that item, so it needs to be used in lieu of $_.

my %seen;
my @filtered = grep { /^"([^"]+)"/ && !$seen{$1}++ } @array;
Sign up to request clarification or add additional context in comments.

1 Comment

Bonus: This method preserves the input's order (while keys %seen doesn't).
2

For duplicate detection, a hash is the tool for the job.

#!/usr/bin/perl

use strict;
use warnings;
my @array = (
    "\"Foo in Bar\" on Mon 09 Feb 2015 08:07:44 AM PST",
    "\"Foo in Bar\" on Mon 09 Feb 2015 08:07:47 AM MST",
    "\"Foo in Bar\" on Mon 09 Feb 2015 08:07:49 AM MST",
    "\"Apple in Pie\" on Mon 09 Feb 2015 10:22:32 AM MST",
    "\"Foo in Bar\" on Mon 09 Feb 2015 08:07:51 AM MST",
    "\"Rock in Out\" on Mon 09 Feb 2015 11:17:41 AM PST"
);

my %seen;

foreach my $element (@array) {
    my ($first_bit) = ( $element =~ m/^(.*) on/ );
    $seen{$first_bit} = $element;
}

foreach my $first_bit ( keys %seen ) {
    print $seen{$first_bit}, "\n";
}

We iterate the array, selecting the 'first bit' out of the string (I'm grabbing anything in front of 'on' in this example - you may want to match something different).

By using that as a hash key, and duplicates overwrite, and then we just print one element. You could test for existence of $seen{$first_bit} if you want the first occurrence, rather than the last. You could use Time::Piece to parse dates and sort if that was important to you.

2 Comments

I am interested in adding the Time::Piece to parse dates. Is there a quick and easy method to doing this? I can't seem to find any clear examples around.
There's a section on it in the man page: search.cpan.org/~rjbs/Time-Piece-1.29/Piece.pm#Date_Parsing - it supports strptime which'll turn a formatted time into a numeric representation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.