Remove ALMOST repeats in a perl Array

Question

I have an array with the following elements:

my @array = ("\"Foo in Bar\" on Mon 09 Feb 2015 08:07:44 AM PST",
"\"Foo in Bar\" on Mon 09 Feb 2015 08:07:47 AM MST",
"\"Foo in Bar\" on Mon 09 Feb 2015 08:07:49 AM MST",
"\"Apple in Pie\" on Mon 09 Feb 2015 10:22:32 AM MST",
"\"Foo in Bar\" on Mon 09 Feb 2015 08:07:51 AM MST",
"\"Rock in Out\" on Mon 09 Feb 2015 11:17:41 AM PST")

I want to sort this array so that all elements with a repeated string (inside the "") will be removed. The reason why this is a little unique is because the time associated with each string is a little different, but not much.

Here is what I want the output to look like:

"\"Foo in Bar\" on Mon 09 Feb 2015 08:07:49 AM MST",
"\"Apple in Pie\" on Mon 09 Feb 2015 10:22:32 AM MST",
"\"Rock in Out\" on Mon 09 Feb 2015 11:17:41 AM PST"

I don't really care about sorting the time, just removing the repeats inside the "".

This was my thought process so far:

    my @row;
    foreach my $row (@array) {
        my $name = $row;
        $name =~ s/\son.*//;
        next if (grep {$_ =~ /($name)/} @row);
        push(@row,$row);
    }

There has to be a better way to do this. Also, I am having issues with my method (the grep doesn't seem to be working as intended, it won't go to the next statement).

Using a hash to check for duplicates is the idiomatic way to do it. — TLP
– TLP, Commented Feb 9, 2015 at 20:09

ikegami · Accepted Answer · 2015-02-09 20:47:53Z

5

The following assigns a list without duplicates to @filtered:

my %seen;
my @filtered = grep { !$seen{$_}++ } @array;

In your case, a minor tweek is needed. The substring between quotes is what determines if you've already seen that item, so it needs to be used in lieu of $_.

my %seen;
my @filtered = grep { /^"([^"]+)"/ && !$seen{$1}++ } @array;

edited Feb 9, 2015 at 20:47

answered Feb 9, 2015 at 20:10

ikegami

391k17 gold badges291 silver badges555 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ikegami Over a year ago

Bonus: This method preserves the input's order (while keys %seen doesn't).

Sobrique · Accepted Answer · 2015-02-09 20:17:18Z

2

For duplicate detection, a hash is the tool for the job.

#!/usr/bin/perl

use strict;
use warnings;
my @array = (
    "\"Foo in Bar\" on Mon 09 Feb 2015 08:07:44 AM PST",
    "\"Foo in Bar\" on Mon 09 Feb 2015 08:07:47 AM MST",
    "\"Foo in Bar\" on Mon 09 Feb 2015 08:07:49 AM MST",
    "\"Apple in Pie\" on Mon 09 Feb 2015 10:22:32 AM MST",
    "\"Foo in Bar\" on Mon 09 Feb 2015 08:07:51 AM MST",
    "\"Rock in Out\" on Mon 09 Feb 2015 11:17:41 AM PST"
);

my %seen;

foreach my $element (@array) {
    my ($first_bit) = ( $element =~ m/^(.*) on/ );
    $seen{$first_bit} = $element;
}

foreach my $first_bit ( keys %seen ) {
    print $seen{$first_bit}, "\n";
}

We iterate the array, selecting the 'first bit' out of the string (I'm grabbing anything in front of 'on' in this example - you may want to match something different).

By using that as a hash key, and duplicates overwrite, and then we just print one element. You could test for existence of $seen{$first_bit} if you want the first occurrence, rather than the last. You could use Time::Piece to parse dates and sort if that was important to you.

answered Feb 9, 2015 at 20:17

Sobrique

53.6k8 gold badges63 silver badges107 bronze badges

2 Comments

Evan Miller Over a year ago

I am interested in adding the Time::Piece to parse dates. Is there a quick and easy method to doing this? I can't seem to find any clear examples around.

Sobrique Over a year ago

There's a section on it in the man page: search.cpan.org/~rjbs/Time-Piece-1.29/Piece.pm#Date_Parsing - it supports strptime which'll turn a formatted time into a numeric representation

Collectives™ on Stack Overflow

Remove ALMOST repeats in a perl Array

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related