49

What is the smartest way of searching through an array of strings for a matching string in Perl?

One caveat, I would like the search to be case-insensitive

so "aAa" would be in ("aaa","bbb")

3
  • 2
    how many times will you search the list? Commented May 28, 2010 at 2:54
  • it will only be searched once actually. runtime complexity isn't what i'm really worried about Commented May 28, 2010 at 20:44
  • 1
    not that it matters, or is in any way related, but if you kept your array in a set of hash keys (all with the value of 'whatever') you can find out if it exists or not much faster although case insensitivity does pose a problem...oh yeah and that ~~ smartmatch is slow as can be... otherwise, stick with Ether's well-documented answer that proves that the simplest answer isn't always the best answer, even if it is not from your point of view, the correct answer. Commented May 31, 2015 at 10:19

7 Answers 7

160

It depends on what you want the search to do:

  • if you want to find all matches, use the built-in grep:

    my @matches = grep { /pattern/ } @list_of_strings;
    
  • if you want to find the first match, use first in List::Util:

    use List::Util 'first';  
    my $match = first { /pattern/ } @list_of_strings;
    
  • if you want to find the count of all matches, use true in List::MoreUtils:

    use List::MoreUtils 'true';
    my $count = true { /pattern/ } @list_of_strings;
    
  • if you want to know the index of the first match, use first_index in List::MoreUtils:

    use List::MoreUtils 'first_index'; 
    my $index = first_index { /pattern/ } @list_of_strings;
    
  • if you want to simply know if there was a match, but you don't care which element it was or its value, use any in List::Util:

    use List::Util 1.33 'any';
    my $match_found = any { /pattern/ } @list_of_strings;
    

All these examples do similar things at their core, but their implementations have been heavily optimized to be fast, and will be faster than any pure-perl implementation that you might write yourself with grep, map or a for loop.


Note that the algorithm for doing the looping is a separate issue than performing the individual matches. To match a string case-insensitively, you can simply use the i flag in the pattern: /pattern/i. You should definitely read through perldoc perlre if you have not previously done so.

Sign up to request clarification or add additional context in comments.

4 Comments

You are assuming "match" means regex match, but the example given is just (case insensitive) equality.
true is a bit overkill IMO. Is it faster than my $count = grep { /pattern/ } @list_of_strings; ?
@Zaid or even perldoc perlrequick first and then later perldoc perlreut
Zaid and Telemachus: perlretquick and perlretut are both mentioned specifically in the very first paragraph in the description for perlre. It is the best to start off with because it presents the reader with a roadmap on how to proceed along the learning path. (unless you are either in a terminal with carpal tunnel syndrome or just hate clicking the mouse one extra time).
33

I guess

@foo = ("aAa", "bbb");
@bar = grep(/^aaa/i, @foo);
print join ",",@bar;

would do the trick.

3 Comments

Maybe: @bar = grep(/^aaa$/i, @foo); Since what you wrote will search all strings beginning with /aaa/i, so it will also find /aaaa/ and /aaaa+/.
I think it would be more efficient to use grep {lc $_ eq 'aaa'}, @foo thus avoiding the need for RegEx processing.
All true, and very valid depending on the use-case. But I guess the examples given by the OP are only slightly representative for his issue.
32

Perl 5.10+ contains the 'smart-match' operator ~~, which returns true if a certain element is contained in an array or hash, and false if it doesn't (see perlfaq4):

The nice thing is that it also supports regexes, meaning that your case-insensitive requirement can easily be taken care of:

use strict;
use warnings;
use 5.010;

my @array  = qw/aaa bbb/;
my $wanted = 'aAa';

say "'$wanted' matches!" if /$wanted/i ~~ @array;   # Prints "'aAa' matches!"

4 Comments

Please be aware that the smart match feature is experimental (source)
Is it still experimental in 2020? There seems to be mixed opinions on whether or not to use it etc.: curiousprogrammer.wordpress.com/2010/06/15/…
Strawberry Perl 5.30 still emits "Smartmatch is experimental"...
As of Perl 5.38, it's deprecated. Don't use it!
6

If you will be doing many searches of the array, AND matching always is defined as string equivalence, then you can normalize your data and use a hash.

my @strings = qw( aAa Bbb cCC DDD eee );

my %string_lut;

# Init via slice:
@string_lut{ map uc, @strings } = ();

# or use a for loop:
#    for my $string ( @strings ) {
#        $string_lut{ uc($string) } = undef;
#    }


#Look for a string:

my $search = 'AAa';

print "'$string' ", 
    ( exists $string_lut{ uc $string ? "IS" : "is NOT" ),
    " in the array\n";

Let me emphasize that doing a hash lookup is good if you are planning on doing many lookups on the array. Also, it will only work if matching means that $foo eq $bar, or other requirements that can be met through normalization (like case insensitivity).

Comments

6
#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

my @bar = qw(aaa bbb);
my @foo = grep {/aAa/i} @bar;

print Dumper \@foo;

Comments

4

Perl string match can also be used for a simple yes/no.

my @foo=("hello", "world", "foo", "bar");

if ("@foo" =~ /\bhello\b/){
    print "found";
}
else{
    print "not found";
}

2 Comments

This will cause false positives in certain situations, consider e.g. my @foo = ( "hello world hello bar" );
Good observation about the false positives. Being aware of that, I find this nice and simple for single-word testing. If necessary, one could always add delimiter characters using join - for instance using \x01 would work for most text-strings.
1

For just a boolean match result or for a count of occurrences, you could use:

use 5.014; use strict; use warnings;
my @foo=('hello', 'world', 'foo', 'bar', 'hello world', 'HeLlo');
my $patterns=join(',',@foo);
for my $str (qw(quux world hello hEllO)) {
    my $count=map {m/^$str$/i} @foo;
    if ($count) {
        print "I found '$str' $count time(s) in '$patterns'\n";
    } else {
        print "I could not find '$str' in the pattern list\n"
    };
}

Output:

I could not find 'quux' in the pattern list
I found 'world' 1 time(s) in 'hello,world,foo,bar,hello world,HeLlo'
I found 'hello' 2 time(s) in 'hello,world,foo,bar,hello world,HeLlo'
I found 'hEllO' 2 time(s) in 'hello,world,foo,bar,hello world,HeLlo'

Does not require to use a module.
Of course it's less "expandable" and versatile as some code above.
I use this for interactive user answers to match against a predefined set of case unsensitive answers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.