3

I'm trying to figure out how to sort an array alphabetically in Perl. Here is what I have that works fine in english:

   # List of countries (kept like this to keep clean, as its re-used in other places)
    my $countries = {
        'AT' => "íAustria",
        'AU' => "Australia",
        'BE' => "Belgium",
        'BG' => "Bulgaria",
        'CA' => "Canada",
        'CY' => "Cyprus",
        'CZ' => "Czech Republic",
        'DK' => "Denmark",
        'EN' => "England",
        'EE' => "Estonia",
        'FI' => "Finland",
        'FR' => "France",
        'DE' => "Germany",
        'GB' => "Great Britain",
        'GR' => "Greece",
        'HU' => "Hungary",
        'IE' => "Ireland",
        'IT' => "Italy",
        'LV' => "Latvia",
        'LT' => "Lithuania",
        'LU' => "Luxembourg",
        'MT' => "Malta",
        'NZ' => "New Zealand",
        'NL' => "Netherlands",
        'PL' => "Poland",
        'PT' => "Portugal",
        'RO' => "Romania",
        'SK' => "Slovakia",
        'SI' => "Slovenia",
        'ES' => "Spain",
        'SE' => "Sweden",
        'CH' => "Switzerland",
        'SC' => "Scotland",
        'UK' => "United Kingdom",
        'US' => "USA",
        'TK' => "Turkey",
        'NO' => "Norway",
        'MX' => "Mexico",
        'IL' => "Israel",
        'IN' => "India",
        'IS' => "Iceland",
        'CN' => "China",
        'JP' => "Japan",
        'VN' => "áVietnamí"
    };
   # Populate the original loop with "name" and "code"
    my @country_loop_orig;
    print $IN->header;
    foreach (keys %{$countries}) {
      push @country_loop_orig, {
        name => $countries->{$lang}->{$_},
        code => $_
      }
    }

   # sort it alphabetically
   my @country_loop = sort { lc($a->{name}) cmp lc($b->{name})  } @country_loop_orig;

This works fine with the English versions:

Australia
Austria
Belgium
Bulgaria
Canada
China
Cyprus
Czech Republic
Denmark
England
Estonia
Finland
France
Germany
Great Britain
Greece
Hungary
Iceland
India
Ireland
Israel
Italy
Japan
Latvia
Lithuania
Luxembourg
Malta
Mexico
Netherlands
New Zealand
Norway
Poland
Portugal
Romania
Scotland
Slovakia
Slovenia
Spain
Sweden
Switzerland
Turkey
United Kingdom
USA
Vietnam

...but when you try and do it with utf8 such as íéó etc, it doesn't work:

Australia
Belgium
Bulgaria
Canada
China
Cyprus
Czech Republic
Denmark
England
Estonia
Finland
France
Germany
Great Britain
Greece
Hungary
Iceland
India
Ireland
Israel
Italy
Japan
Latvia
Lithuania
Luxembourg
Malta
Mexico
Netherlands
New Zealand
Norway
Poland
Portugal
Romania
Scotland
Slovakia
Slovenia
Spain
Sweden
Switzerland
Turkey
United Kingdom
USA
áVietnam
íAustria

How do you achieve this? I found Sort::Naturally::XS, but couldn't get it to work.

1
  • 2
    cmp doesn't know anything about character sets and encodings. It does straight up character (string element) by character (string element) comparisons. (Except possibly under use locale;, which you shouldn't use.) Commented Oct 7, 2017 at 7:10

1 Answer 1

7

The Unicode::Collate should help with this.

A simple example that sorts your last list

use warnings;
use strict;
use feature 'say';

use Unicode::Collate;

use open ":std", ":encoding(UTF-8)";

open my $fh, '<', "country_list.txt";
my @list = <$fh>;
chomp @list;

my $uc  = Unicode::Collate->new();
my @sorted = $uc->sort(@list);

say for @sorted;

However, in some languages non-ascii characters may have a very particular accepted placement, and the question doesn't provide any details. Then perhaps Unicode::Collate::Locale can help.

See (study) this perl.com article and this post (T. Christiansen), and this Effective Perler article.


If data to be sorted is in a complex data structure, cmp method is for individual comparison

my @sorted = map { $uc->cmp($a, $b) } @list;

where for $a and $b you'd extract what need be compared from the complex data structure.

Sign up to request clarification or add additional context in comments.

7 Comments

Awesome, thanks. How would you go about sorting a hash inside the array? For example I'm doing $a->{name} cmp $b->{name}?
BTW, it works perfectly when I sort just an array of the names (without having it as a hash structure). I guess I could re-work how I do the data storage, but I'll wait to see if there is a better way before I spend ages doing that :)
@AndrewNewby Use cmp method, @s = sort { $uc->cmp($a, $b) } @list;, for individual comparisons
you legend! Works like a charm: my $uc = Unicode::Collate->new(); my @country_loop = sort { $uc->cmp($a->{name}, $b->{name}) } @country_loop_orig;
@AndrewNewby Cool :) Just added a note for it
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.