13

I'm looping through an array, and I want to test if an element is found in another array.

In pseudo-code, what I'm trying to do is this:

foreach $term (@array1) {
    if ($term is found in @array2) { 
        #do something here
    }
}

I've got the "foreach" and the "do something here" parts down-pat ... but everything I've tried for the "if term is found in array" test does NOT work ...

I've tried grep:

if grep {/$term/} @array2 { #do something }
# this test always succeeds for values of $term that ARE NOT in @array2

if (grep(/$term/, @array2)) { #do something }
# this test likewise succeeds for values NOT IN the array

I've tried a couple different flavors of "converting the array to a hash" which many previous posts have indicated are so simple and easy ... and none of them have worked.

I am a long-time low-level user of perl, I understand just the basics of perl, do not understand all the fancy obfuscated code that comprises 99% of the solutions I read on the interwebs ... I would really, truly, honestly appreciate any answers that are explicit in the code and provide a step-by-step explanation of what the code is doing ...

... I seriously don't grok $_ and any other kind or type of hidden, understood, or implied value, variable, or function. I would really appreciate it if any examples or samples have all variables and functions named with clear terms ($term as opposed to $_) ... and describe with comments what the code is doing so I, in all my mentally deficient glory, may hope to possibly understand it some day. Please. :-)

...

I have an existing script which uses 'grep' somewhat succesfully:

$rc=grep(/$term/, @array);
if ($rc eq 0) { #something happens here }

but I applied that EXACT same code to my new script and it simply does NOT succeed properly ... i.e., it "succeeds" (rc = zero) when it tests a value of $term that I know is NOT present in the array being tested. I just don't get it.

The ONLY difference in my 'grep' approach between 'old' script and 'new' script is how I built the array ... in old script, I built array by reading in from a file:

  @array=`cat file`;

whereas in new script I put the array inside the script itself (coz it's small) ... like this:

  @array=("element1","element2","element3","element4");

How can that result in different output of the grep function? They're both bog-standard arrays! I don't get it!!!! :-(

########################################################################

addendum ... some clarifications or examples of my actual code:

########################################################################

The term I'm trying to match/find/grep is a word element, for example "word123".

This exercise was just intended to be a quick-n-dirty script to find some important info from a file full of junk, so I skip all the niceties (use strict, warnings, modules, subroutines) by choice ... this doesn't have to be elegant, just simple.

The term I'm searching for is stored in a variable which is instantiated via split:

foreach $line(@array1) {
  chomp($line);  # habit

  # every line has multiple elements that I want to capture
  ($term1,$term2,$term3,$term4)=split(/\t/,$line);  

  # if a particular one of those terms is found in my other array 'array2'
  if (grep(/$term2/, @array2) { 
    # then I'm storing a different element from the line into a 3rd array which eventually will be outputted
    push(@known, $term1) unless $seen{$term1}++;
  }
}

see that grep up there? It ain't workin right ... it is succeeding for all values of $term2 even if it is definitely NOT in array2 ... array1 is a file of a couple thousand lines. The element I'm calling $term2 here is a discrete term that may be in multiple lines, but is never repeated (or part of a larger string) within any given line. Array2 is about a couple dozen elements that I need to "filter in" for my output.

...

I just tried one of the below suggestions:

if (grep $_ eq $term2, @array2) 

And this grep failed for all values of $term2 ... I'm getting an all or nothing response from grep ... so I guess I need to stop using grep. Try one of those hash solutions ... but I really could use more explanation and clarification on those.

6
  • 1
    Can you provide a short script (on pastebin or equivalent) that recreates your problem? That would help us diagnose what's going on. Commented Jul 6, 2012 at 15:31
  • 2
    How can I tell whether a certain element is contained in a list or array? Commented Jul 6, 2012 at 15:32
  • 1
    stackoverflow.com/questions/2860226/… Commented Jul 6, 2012 at 15:32
  • Yup... using a hash is the right thing to do here, otherwise your making a solution that won't perform for large arrays (since you're scanning array2 for every element of array1 Commented Jul 6, 2012 at 15:35
  • 1
    What is the value of $term? Provide examples of your search term and what you expect to match and not match. Are you seeking an exact match ("foo" matches only "foo") or a partial match ("foo" matches "food")? Commented Jul 6, 2012 at 15:36

8 Answers 8

9

This is in perlfaq. A quick way to do it is

my %seen;
$seen{$_}++ for @array1;
for my $item (@array2) {
    if ($seen{$item}) {
        # item is in array2, do something
    }
}

If letter case is not important, you can set the keys with $seen{ lc($_) } and check with if ($seen{ lc($item) }).

ETA:

With the changed question: If the task is to match single words in @array2 against whole lines in @array1, the task is more complicated. Trying to split the lines and match against hash keys will likely be unsafe, because of punctuation and other such things. So, a regex solution will likely be the safest.

Unless @array2 is very large, you might do something like this:

my $rx = join "|", @array2;
for my $line (@array1) {
    if ($line =~ /\b$rx\b/) {  # use word boundary to avoid partial matches
        # do something
    }
}

If @array2 contains meta characters, such as *?+|, you have to make sure they are escaped, in which case you'd do something like:

my $rx = join "|", map quotemeta, @array2;
# etc
Sign up to request clarification or add additional context in comments.

10 Comments

The advantage of this is that it's O(N). The naive solution is O(N^2). choroba's and cdarke's are O(N^2).
I don't think this example will work as is, and I don't understand it well enough to see how it could be modified to suit. Array1 is the contents of a file, with each element of the array being an entire line from the file -- a row of data comprised of multiple elements. I have to chop that up to get the individual elements that I need to test against array2 which is a simpler array comprised of a plain list of single words. I can't compare an entire line from array1 against the single words in array2, that won't work.
@user1505587 You should have mentioned that in your question, it's a fairly important piece of information. So I take it then that you also want to ignore letter case. I will add the fix.
If you just want to find a list of words in a file, and you want a quick and dirty solution, why are you not just using grep with the -f option?
@TLP ... case is unimportant. And I don't think the source of the datasets is too important either since all data is reduced to arrays and elements (scalar vars). I have the element in hand, it is $term. I want to find IF $term exists in Array2. I don't need to pull anything out of Array2, this is an existence check only. If $term exists in Array2, then I have to do some work on the line that $term came out of (i.e., the original element form Array1). That also is already done, and not in question.
|
6

You could use the (infamous) "smart match" operator, provided you are on 5.10 or later:

#!/usr/bin/perl
use strict;
use warnings;

my @array1 = qw/a b c d e f g h/; 
my @array2 = qw/a c e g z/; 

print "a in \@array1\n" if 'a' ~~ @array1;
print "z in \@array1\n" if 'z' ~~ @array1;
print "z in \@array2\n" if 'z' ~~ @array2;

The example is very simple, but you can use an RE if you need to as well. I should add that not everyone likes ~~ because there are some ambiguities and, um, "undocumented features". Should be OK for this though.

6 Comments

I tried this: if ($term1 ~~ @array2) { print "$term found in array2\n"; }
sorry, got distracted mid-comment and it timed out on me ... but I was trying to use that indicated 'smart match operator' but it did not work for me. I'm using perl 5.10.1 on Cygwin. No errors, just didn't provide expected results.
@MuleHeadJoe - RE -> regular expression. I prefer cdarke's smart match solution. Am unsure, though, why it didn't work for you. For example, using cdarke's arrays: do{print "$_ found in \@array2\n" if $_ ~~ @array2} for @array1; shows a, c, e, g in @array2
@MuleheadJoe - RE: either Religous Education or Regular Expression. I guess you need to figure out which is appropriate for the context.
@Kenosis: Your results for that test appear to be correct to me, what else would you expect? a,c,e,g are all in both @array1 and @array2 (z is in @array2 but not in @array1).
|
5

This should work.

#!/usr/bin/perl
use strict;
use warnings;

my @array1 = qw/a b c d e f g h/;
my @array2 = qw/a c e g z/;

for my $term (@array1) {
    if (grep $_ eq $term, @array2) {
        print "$term found.\n";
    }
}

Output:

a found.
c found.
e found.
g found.

5 Comments

The OP is doing regex matches, not exact matches. But it's not clear what type of match he/she really wants.
Is there a reason your example uses "for" and not "foreach"? Is it personal preference or is there a technical reason? In my script I use "foreach $line(@array1)" ... my array1 is a text file that I'm reading in (@array1=`cat myfile`;), chopping up each line so I can reorder elements and eventually output everything as a csv file that I can open & manipulate in Excel.
for and foreach are synonymous - use whichever you find to be more expressive.
@RickF ... thanks, I thought they had different features. I rarely if ever use 'for' so wasn't sure.
I think foreach was an accident in the design of an otherwise good language. What else could explain it? Many other languages uses for so I think for is clearer than foreach. (I addition to being shorter)
2
#!/usr/bin/perl

@ar = ( '1','2','3','4','5','6','10' );
@arr = ( '1','2','3','4','5','6','7','8','9' ) ;

foreach $var ( @arr ){
    print "$var not found\n " if ( ! ( grep /$var/, @ar )) ;
}

Comments

1

Pattern matching is the most efficient way of matching elements. This would do the trick. Cheers!

print "$element found in the array\n" if ("@array" =~ m/$element/);

Comments

0

Your 'actual code' shouldn't even compile:

if (grep(/$term2/, @array2) { 

should be:

if (grep (/$term2/, @array2)) { 

You have unbalanced parentheses in your code. You may also find it easier to use grep with a callback (code reference) that operates on its arguments (the array.) It helps keep the parenthesis from blurring together. This is optional, though. It would be:

if (grep {/$term2/} @array2) { 

You may want to use strict; and use warnings; to catch issues like this.

1 Comment

my bad, I'm not cutting/pasting code ... code is on a physically separate machine ... the real deal had matched parens, and in that world I always check for syntax errors by doing "perl -cw [scriptname]" ...
0

The example below might be helpful, it tries to see if any element in @array_sp is present in @my_array:

#! /usr/bin/perl -w

@my_array = qw(20001 20003);

@array_sp = qw(20001 20002 20004);
print "@array_sp\n";

foreach $case(@my_array){
    if("@array_sp" =~ m/$case/){
    print "My God!\n";
    }

}

use pattern matching can solve this. Hope it helps -QC

Comments

0
1. grep with eq , then 
    if (grep {$_ eq $term2} @array2) { 
    print "$term2 exists in the array";
    }

2. grep with regex , then 
    if (grep {/$term2/} @array2) {
    print "element with pattern $term2 exists in the array";
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.