0

Suppose I have an array of indexes, A. Suppose I have an array B, where every key is an array containing some indexes or number. I would know which of the entries in B contain some index appearing in A. For example (in php style):

A = [3,45,67,8]
B = [ 1 => [1,6,81],
      2 => [5,67,3,4,5,66,6],
      3 => [55,56,57,58],
      4 => [45,80,81,82]
    ]

The entries of B which contain some value of A are 2 and 4. So the function I would make should be:

function solution = filter(A,B) // solution = [2,4]

Now, with a brute loop, looping over entries of B, the complexity will be O(nm), where n is #B and m is the size of longest row in B. There are smarter solutions?

3
  • Why isn't 1 a qualifying key? It contains 8 which is in the whitelist. Commented Oct 25, 2018 at 10:45
  • yes you're right... Commented Oct 25, 2018 at 11:27
  • 3
    You have to loop through every entry of B. it can't get better than that. Commented Oct 25, 2018 at 13:29

1 Answer 1

2

Edit #2:

By moving all values to be compared to key positions, php can more than double efficienct (by my demo's system time metric).

Furthermore, if you have duplicate values in your subsets, calling array_flip() will reduce the size by disallowing duplicate keys.

Code: (Demo)

$A = array_flip($A);  // prepare for key comparisons    
$result = [];
foreach ($B as $key => $haystack) {
    if (array_intersect_key(array_flip($haystack), $A)) {
        $result[] = $key;
    }
}
var_export($result);

Edit:

Whenever you want to optimize array searching with php, it is often best to try to prepare your data in a way which allows php to leverage its strength with hash tables. https://codedmemes.com/lib/best-performance-array-intersection/

Consider this preparation...

Code: (Demo)

foreach ($B as $key => $haystack) {
    foreach ($haystack as $hay) {
        $C[$hay][] = $key;
    }
}

var_export(array_keys(array_flip((array_merge(...array_intersect_key($C, array_flip($A)))))));

Output:

array (
  0 => 1,
  1 => 2,
  2 => 4,
)
  • The nested foreach() loops generate a collection of subarrays which have unique values from B's subarrays as keys and $B's original keys as new subarray values.
  • array_intersect_key() checks the keys which php does much faster than checking values. (see first hyperlink article)
  • Then array_merge(...) flattens the subarrays into a single 1-dim array.
  • Finally array_flip() and array_keys() removes duplicates and re-indexes the results.

I don't know exactly how array_intersect() works its magic in terms of efficiency, but this is probably how I'd go about it:

Code: (Demo)

$A = [3,45,67,8];
$B = [ 1 => [1,6,8],
      2 => [5,67,3,4,5,66,6],
      3 => [55,56,57,58],
      4 => [45,80,81,82]
    ];

$result = [];
foreach ($B as $key => $haystack) {
    if (array_intersect($haystack, $A)) {  // generate an array with co-existing values
        $result[] = $key;  // add key if array_intersect makes a non-empty array
    }
}
var_export($result);

Output:

array (
  0 => 1,
  1 => 2,
  2 => 4,
)

I suppose if there is a "downside" to using array_intersect() it would be that it will bother to make multiple matches, when only a single match per row is necessary. For this reason, array_search() or a breaking loop might have advantages.

Code: (Demo)

$result = [];
foreach ($B as $key => $haystack) {
    foreach ($haystack as $hay) {
        if (in_array($hay, $A)) {
            $result[] = $key;
            break;
        }
    }
}
var_export($result);  // same result
Sign up to request clarification or add additional context in comments.

6 Comments

I think the complexity is still O(nm), because array_intersect is linear with the size of the array. The second choice is much better, but the worst case is still O(nm)...
break reduces m in qualifying cases.
@giuseppe I've added a new technique to the start of my answer.
beside the fact that you can use isset() in a similar way, I think nothing change wrt complexity: the worst case is stille O(nm), agree?
More times than not, when I post answers involving array searching, I use isset() because it is the fastest way. As the hyperlinked article describes, searching keys is much faster, the cost in this task is in the data preparation. So you will need to decide for yourself if the cost of preparation outweighs a brute force approach -- you will need to benchmark your actual project data.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.