Fast Filter of Array With Respect to Another

Question

Suppose I have an array of indexes, A. Suppose I have an array B, where every key is an array containing some indexes or number. I would know which of the entries in B contain some index appearing in A. For example (in php style):

A = [3,45,67,8]
B = [ 1 => [1,6,81],
      2 => [5,67,3,4,5,66,6],
      3 => [55,56,57,58],
      4 => [45,80,81,82]
    ]

The entries of B which contain some value of A are 2 and 4. So the function I would make should be:

function solution = filter(A,B) // solution = [2,4]

Now, with a brute loop, looping over entries of B, the complexity will be O(nm), where n is #B and m is the size of longest row in B. There are smarter solutions?

Why isn't 1 a qualifying key? It contains 8 which is in the whitelist. — mickmackusa
– mickmackusa ♦, Commented Oct 25, 2018 at 10:45
You have to loop through every entry of B. it can't get better than that. — mighty_dev
– mighty_dev, Commented Oct 25, 2018 at 13:29

mickmackusa · Accepted Answer · 2018-10-25 20:43:11Z

2

Edit #2:

By moving all values to be compared to key positions, php can more than double efficienct (by my demo's system time metric).

Furthermore, if you have duplicate values in your subsets, calling array_flip() will reduce the size by disallowing duplicate keys.

Code: (Demo)

$A = array_flip($A);  // prepare for key comparisons    
$result = [];
foreach ($B as $key => $haystack) {
    if (array_intersect_key(array_flip($haystack), $A)) {
        $result[] = $key;
    }
}
var_export($result);

Edit:

Whenever you want to optimize array searching with php, it is often best to try to prepare your data in a way which allows php to leverage its strength with hash tables. https://codedmemes.com/lib/best-performance-array-intersection/

Consider this preparation...

Code: (Demo)

foreach ($B as $key => $haystack) {
    foreach ($haystack as $hay) {
        $C[$hay][] = $key;
    }
}

var_export(array_keys(array_flip((array_merge(...array_intersect_key($C, array_flip($A)))))));

Output:

array (
  0 => 1,
  1 => 2,
  2 => 4,
)

The nested foreach() loops generate a collection of subarrays which have unique values from B's subarrays as keys and $B's original keys as new subarray values.
array_intersect_key() checks the keys which php does much faster than checking values. (see first hyperlink article)
Then array_merge(...) flattens the subarrays into a single 1-dim array.
Finally array_flip() and array_keys() removes duplicates and re-indexes the results.

I don't know exactly how array_intersect() works its magic in terms of efficiency, but this is probably how I'd go about it:

Code: (Demo)

$A = [3,45,67,8];
$B = [ 1 => [1,6,8],
      2 => [5,67,3,4,5,66,6],
      3 => [55,56,57,58],
      4 => [45,80,81,82]
    ];

$result = [];
foreach ($B as $key => $haystack) {
    if (array_intersect($haystack, $A)) {  // generate an array with co-existing values
        $result[] = $key;  // add key if array_intersect makes a non-empty array
    }
}
var_export($result);

Output:

array (
  0 => 1,
  1 => 2,
  2 => 4,
)

I suppose if there is a "downside" to using array_intersect() it would be that it will bother to make multiple matches, when only a single match per row is necessary. For this reason, array_search() or a breaking loop might have advantages.

Code: (Demo)

$result = [];
foreach ($B as $key => $haystack) {
    foreach ($haystack as $hay) {
        if (in_array($hay, $A)) {
            $result[] = $key;
            break;
        }
    }
}
var_export($result);  // same result

edited Oct 25, 2018 at 20:43

answered Oct 25, 2018 at 11:02

mickmackusa♦

49.2k13 gold badges98 silver badges165 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

gdm Over a year ago

I think the complexity is still O(nm), because array_intersect is linear with the size of the array. The second choice is much better, but the worst case is still O(nm)...

mickmackusa Over a year ago

break reduces m in qualifying cases.

mickmackusa Over a year ago

@giuseppe I've added a new technique to the start of my answer.

gdm Over a year ago

beside the fact that you can use isset() in a similar way, I think nothing change wrt complexity: the worst case is stille O(nm), agree?

mickmackusa Over a year ago

More times than not, when I post answers involving array searching, I use isset() because it is the fastest way. As the hyperlinked article describes, searching keys is much faster, the cost in this task is in the data preparation. So you will need to decide for yourself if the cost of preparation outweighs a brute force approach -- you will need to benchmark your actual project data.

|

Collectives™ on Stack Overflow

Fast Filter of Array With Respect to Another

1 Answer 1

By moving all values to be compared to key positions, php can more than double efficienct (by my demo's system time metric).

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

By moving all values to be compared to key positions, php can more than double efficienct (by my demo's system time metric).

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related