4

I have a list of strings with the following format:

('group1-1', 'group1-2','group1-9', 'group2-1','group2-2', 'group2-9','group1-10', 'group2-10' )

I need them to be sorted as below:

group wise first and then number wise.

('group1-1', 'group1-2','group1-9','group1-10', 'group2-1','group2-2', 'group2-9', 'group2-10' )

I've written following code, but it's not working as expected: a comparator that sorts based on the group and if the groups match, it sorts based on the number.

my @list = ('group1-1', 'group1-2','group1-9', 
'group2-1','group2-2', 'group2-9','group1-10', 'group2-10' );
@list = sort compare @list;
for (@list){
    print($_."\n");
}

sub compare{
    my $first_group, $first_num = get_details($a);
    my $second_group, $second_num = get_details($b);
    if($first_group < $second_group){
      return -1;
   } elsif($first_group == $second_group){
      if ( $first_num < $second_num) {
         return -1;
      } elsif ( $first_num == $second_num ) {
         return 0;
      } else {
         return 1;
      }
   } else{
      return 1;                       
   }
}

sub get_details($){
   my $str= shift;
   my $group = (split /-/, $str)[0];
   $group =~ s/\D//g;
   my $num = (split /-/, $str)[1];
   $num =~ s/\D//g;
   return $group, $num;
}

4 Answers 4

5

You could use a Schwartzian transform:

use warnings;
use strict;

my @list = ('group1-1', 'group1-2','group1-9', 
    'group2-1','group2-2', 'group2-9','group1-10', 'group2-10' );

@list = map  { $_->[0] }
        sort { $a->[1] cmp $b->[1] or $a->[2] <=> $b->[2] }
        map  { [$_, split /-/] }
        @list;

for (@list) {
    print($_."\n");
}

Prints:

group1-1
group1-2
group1-9
group1-10
group2-1
group2-2
group2-9
group2-10
Sign up to request clarification or add additional context in comments.

Comments

4

There's a little detail with the data here that can lead to a quiet bug. When you use the pre-hyphen substring for sorting (group1 etc), it has both letters and numbers so when sorted lexicographically it may be wrong for multi-digit numbers. For example

group1, group2, group10

is sort-ed (by default cmp) into

group1
group10
group2

What is wrong, I presume.

So inside sorting we need to break the groupN into group and N, and sort numerically by N.

use warnings;
use strict;
use feature 'say';

my @list = ('group1-1', 'group1-2','group1-9',
    'group2-1','group2-2', 'group2-9',
    'group1-10', 'group2-10',
    'group10-2', 'group10-1'                    # Added some 'group10' data
);


# Break input string into:  group N - N   (and sort by first then second number)

@list = 
    map  { $_->[0] }
    sort { $a->[2] <=> $b->[2] or $a->[4] <=> $b->[4] }
    map  { [ $_, /[0-9]+|[a-zA-Z]+|\-/g ] } 
    @list;

say for @list;

The regex extracts both numbers and words from the string, for sorting. But if that lone substring is always indeed the same (group) then we only ever sort by numbers and can use /[0-9]+/g, and compare numerically arrayref elements at indices 1 and 2.

Prints

group1-1
group1-2
group1-9
group1-10
group2-1
group2-2
group2-9
group2-10
group10-1
group10-2

Comments

3

I'd make sure the strings in the list follows the pattern (\S+\d+-\d+) and then use cmp for the string comparison part and <=> for the numbers:

sub compare {
    if( $a =~ /(\S+)(\d+)-(\d+)/ ) {
        my($A1,$A2,$A3) = ($1,$2,$3);
        if( $b =~ /(\S+)(\d+)-(\d+)/ ) {
            my($B1,$B2,$B3) = ($1,$2,$3);
            return ($A1 cmp $B1) || ($A2 <=> $B2) || $A3 <=> $B3;
        }
    }
    $a cmp $b; # fallback if a string doesn't follow the pattern
};

1 Comment

You can use return ($A1 cmp $B1) || ($A2 <=> $B2) || ($A3 <=> $B3) for the inner thing instead of the ifs. || will short-circuit on -1 or 1 and fall through on 0. It's a common idiom.
2

Natural sort

What you want is called a "natural sort".

use Sort::Key::Natural qw( natsort );

my @sorted = natsort @unsorted;

It can also be performed in-place.

use Sort::Key::Natural qw( natsort_inplace );

natsort_inplace @array;

Key sort

For when you want more control.

use Sort::Key::Multi qw( uukeysort );

my @sorted = uukeysort { /(\d+)/g } @unsorted;

or

use Sort::Key::Multi qw( uukeysort_inplace );

uukeysort_inplace { /(\d+)/g } @array;

Without modules (Unoptimized)

my @sorted =
   sort {
      my ($ag, $an) = $a =~ /(\d+)/g;
      my ($bg, $bn) = $b =~ /(\d+)/g;
      $ag <=> $bg || $an <=> $bn
   }
      @unsorted;

Without modules (Schwartzian transform)

This avoids repeating the same work. Instead of extracting the info 2*N*log2(N) times, it only extracts it N times.

my @sorted =
   map $_->[0],
      sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] }
         map [ $_, /(\d+)/g ],
            @unsorted;

Without modules (GRT)

An optimization of ST.

my @sorted =
   map substr($_->[0], 8),
      sort
         map pack('NNa*', /(\d+)/g, $_),
            @unsorted;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.