1

Having complex multi-level hash where some values are arrays and other are not, how can I remove array element duplicates at any level of such hash?

Just simple hash example (in reality it is way more complex):

$VAR1 = {
  'alpha' => {
    'code' => [
      {
        'x' => 1,
        'y' => 2
      },
      {
        'x' => 1,
        'y' => 2
      }
    ],
    'data' => {
      'domestic' => [
        {
          'a' => 0,
          'b' => 5
        },
        {
          'a' => 0,
          'b' => 5
        }
      ]
    }
  }
}

Hash contains array at different levels and some of them have uniq elements, some of them contains duplicates. Sometimes such array element is complex hash itself.

What is the right way to remove duplicates of any size at any level?

3
  • For the entries in alpha.code, do you know that the entries will always have 'x' and 'y' as elements, or that under alpha.data.domestic, they will always have 'a' and 'b'? Commented Jan 13, 2013 at 1:30
  • @Horus - Structure is same for each element of particular array... Commented Jan 13, 2013 at 12:55
  • Based on that I produced an OO response below that you might want to look at. You could also do Data::Compare from the other responses on it. I prefer the OO version because I think that objects of complex depth shouldn't be represented as non-schema'd hashrefs & arrayrefs. Commented Jan 14, 2013 at 1:36

3 Answers 3

1

This code uses the Data::Compare module and seems to do what you need.

It traverses the data structure recursively, and every array it comes to is examined for duplicates using the Compare function from the module. Duplicates are removed as they are found.

use strict;
use warnings;

use Data::Compare 'Compare';

my %data = (
  alpha => {
    code => [{ x => 1, y => 2 }, { x => 1, y => 2 }],
    data => { domestic => [{ a => 0, b => 5 }, { a => 0, b => 5 }] },
  },
);

process_node(\%data);

use Data::Dump;
dd \%data;

sub process_node {

  my ($data) = @_;

  if (ref $data eq 'HASH') {
    process_node($_) for values %$data;
  }
  elsif (ref $data eq 'ARRAY') {

    my $i = 0;
    while ($i < @$data-1) {
      my $j = $i + 1;
      while ($j < @$data) {
        if (Compare(@{$data}[$i,$j])) {
          splice @$data, $j, 1;
        }
        else {
          $j++;
        }
      }
      $i++;
    }

    process_node($_) for @$data;
  }
}

output

{
  alpha => {
    code => [{ x => 1, y => 2 }],
    data => { domestic => [{ a => 0, b => 5 }] },
  },
}
Sign up to request clarification or add additional context in comments.

1 Comment

Even from brief view this code seems great, module Data::Compare crash on variable type JSON::XS::Boolean at Compare.pm line 189
1

I'm not a fan of deep objects that aren't objectified, and fortunately, Moose has coercion built in so that you can objectify a deep object almost like magic.

I went a bit overboard, but I decided to go ahead and just jot this up as practice for myself, although I think I could have 'roled' a few items and gotten much better results, or forced the coercion for Alpha::Keyed to build the result classes from a required field, regardless.

I don't fully like the way I coded this, but I didn't want to spend a ton of time on it, but it works for the object that you have above. You'd have to do a lot of work to make it go on a more complex object, and you'll want to break up the code into separate classes:

Alpha.pm:

package Alpha;

use Moose;
use Moose::Util::TypeConstraints;

subtype 'AlphaCodes',
    as 'Alpha::Codes';

subtype 'AlphaData',
    as 'Alpha::Data';

coerce 'AlphaCodes',
    from 'ArrayRef[HashRef]',
    via { Alpha::Codes->new( data => $_ ) };

coerce 'AlphaData',
    from 'HashRef',
    via { Alpha::Data->new($_) };

has 'code' => (
    is => 'ro',
    isa => 'AlphaCodes',
    required => 1,
    coerce => 1);

has 'data' => (
    is => 'ro',
    isa => 'AlphaData',
    required => 1,
    coerce => 1);

package Alpha::Codes;

use Moose;
use Moose::Util::TypeConstraints;

extends 'Alpha::KeyedList';

subtype 'ArrayRefOfCodes',
    as 'ArrayRef[Alpha::Code]';

coerce 'ArrayRefOfCodes',
    from 'ArrayRef[HashRef]',
    via { [ map { Alpha::Code->new($_) } @$_ ] };

has 'data' => (
    is => 'ro',
    isa => 'ArrayRefOfCodes',
    required => 1,
    coerce => 1);

package Alpha::KeyedList;

use Moose;
use Moose::Util::TypeConstraints;

sub unique_list {
    my $self = shift;
    my %seen = ();
    my @retval = ();
    foreach my $item ( @{$self->data} ) {
        unless ( $seen{$item->key} ) {
            push(@retval,$item);
            $seen{$item->key} = 1;
        }
    }
    return @retval;
}

package Alpha::Data;

use Moose;
use Moose::Util::TypeConstraints;

subtype 'AlphaDataDomestics',
    as 'Alpha::Data::Domestics';

coerce 'AlphaDataDomestics',
    from 'ArrayRef[HashRef]',
    via { Alpha::Data::Domestics->new(data => $_) };

has 'domestic' => (
    is => 'ro',
    isa => 'AlphaDataDomestics',
    required => 1,
    coerce => 1 );

package Alpha::Data::Domestics;

use Moose;
use Moose::Util::TypeConstraints;

extends 'Alpha::KeyedList';


subtype 'ArrayRefOfDomestics',
    as 'ArrayRef[Alpha::Data::Domestic]';

coerce 'ArrayRefOfDomestics',
    from 'ArrayRef[HashRef]',
    via { [ map { Alpha::Data::Domestic->new($_) } @$_ ] };

has 'data' => (
    is => 'ro',
    isa => 'ArrayRefOfDomestics',
    required => 1,
    coerce => 1);

package Alpha::Data::Domestic;

use Moose;

extends 'Alpha::Keyed';

has 'a' => ( is => 'ro' , isa => 'Str' , required => 1 );
has 'b' => ( is => 'ro' , isa => 'Str' , required => 1 );

sub build_key {
    my $self=  shift;
    return $self->a . '__' . $self->b;
}

package Alpha::Code;

use Moose;

extends 'Alpha::Keyed';

has 'x' => ( is => 'ro' , isa => 'Str' , required => 1 );
has 'y' => ( is => 'ro' , isa => 'Str' , required => 1 );

sub build_key {
    my $self=  shift;
    return $self->x . '__' . $self->y;
}

package Alpha::Keyed;

use Moose;

has 'key' => ( is => 'ro'
    , isa => 'Str'
    , builder => 'build_key'
    , lazy => 1 );

package main;

my $VAR1 = {
  'alpha' => {
    'code' => [
      {
        'x' => 1,
        'y' => 2
      },
      {
        'x' => 1,
        'y' => 2
      }
    ],
    'data' => {
      'domestic' => [
        {
          'a' => 0,
          'b' => 5
        },
        {
          'a' => 0,
          'b' => 5
        },
        {
          'a' => 1,
          'b' => 2
        },
      ]
    }
  }
};

my $alpha = Alpha->new($VAR1->{alpha});

use Data::Dumper;
warn Dumper([ $alpha->code->unique_list ]);
warn Dumper([ $alpha->data->domestic->unique_list ]);

1;

Now for the run:

$VAR1 = [
      bless( {
               'y' => 2,
               'x' => 1,
               'key' => '1__2'
             }, 'Alpha::Code' )
    ];
$VAR1 = [
      bless( {
               'a' => 0,
               'b' => 5,
               'key' => '0__5'
             }, 'Alpha::Data::Domestic' ),
      bless( {
               'a' => 1,
               'b' => 2,
               'key' => '1__2'
             }, 'Alpha::Data::Domestic' )
    ];

Comments

0

I would see the answer to the question here: How can I compare arrays in Perl?

Using that you should be able to iterate through all levels of your hash and compare the arrays in the array level. You would of course need to do it for each possible pairing of arrays.

If you could better assign keys to your arrays so that it some how identified them then you wouldn't need to worry about this as each key needs to be unique.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.