4

Let's continue. Why array_uintersect does not compare values of first array after sorting? By my humble opinion array_udiff and array_uintersect should have similar algorithms, but they have not. Why?

$compare = function($a, $b) use(&$iteration_count)
    {
    echo("$a : $b\n");
    $iteration_count++;
    return strcmp($a, $b);
    };

$a = array('a', 'b', 'c');
$b = array('x', 'y', 'z');

$iteration_count = 0;
echo "array_udiff:" . json_encode(array_udiff($a, $b, $compare)) . "\n";
echo "iterations: $iteration_count\n\n";

$iteration_count = 0;
echo "array_uintersect:" . json_encode(array_uintersect($a, $b, $compare)) . "\n";
echo "iterations: $iteration_count\n\n";

Output

b : a
c : b
y : x
z : y
a : x
a : b
b : x
b : c
c : x
array_udiff:["a","b","c"]
iterations: 9

b : a
c : b
y : x
z : y
a : x  // comparison started
b : x  // but there is no comparison to skip values
c : x
array_uintersect:[]
iterations: 7

3 Answers 3

3

The algorithm that's adopted by array_intersect() and friends is to first assume all values of the first array are present in the other arrays; during iteration it will remove elements that fail this assertion.

When an element is not found in one of the other arrays, the implementation can do two things:

  1. Compare the next element against the current element and remove from the final result while they're equal (this is what diff does)
  2. Compare the next element against the last inspected element of the other arrays and keep removing it from the final result until it's bigger (or the end of the array is reached).

In the case of PHP, the latter is chosen. This has a slight advantage, because it can skip values bigger than the current element but smaller than the last inspected element of the other arrays. For example:

$a = ['a.a0', 'a.a1', 'b.a2', 'c.a3'];
$b = ['a.c0', 'd.c1'];

function cmp_val($a, $b)
{
    echo "$a <=> $b\n";
    return strcmp($a[0], $b[0]);
}

print_r(array_uintersect($a, $b, 'cmp_val'));

Output:

...
-- intersect starts
a.a0 <=> a.c0
a.a0 <=> a.a1 <-- match
a.a1 <=> b.a2
b.a2 <=> d.c1 <-- no match
c.a3 <=> d.c1

As you can see, it uses the strategy of comparing within the first array if a value occurs in all other arrays, just like diff does; and the other strategy is used if the value isn't present in any of the other arrays.

Sign up to request clarification or add additional context in comments.

Comments

0

The initial sorting is in O(n log(n)) for each array, so optimizing the subsequent O(n) elimination pass would yield little gain.

This might be slower on 3 elements arrays, but will become neglectible for bigger sizes.

Comments

0

I was right and wrong at the same time. Those functions have similar algorithms.

$compare = function($a, $b) use(&$iteration_count)
    {
    echo("$a : $b\n");
    $iteration_count++;
    return strcmp($a[0], $b[0]);
    };

$a = array('a1', 'b1', 'c1');
$b = array('a2', 'b2', 'c2');

$iteration_count = 0;
echo "array_uintersect:" . json_encode(array_uintersect($a, $b, $compare)) . "\n";
echo "iterations: $iteration_count\n\n";

Output

b1 : a1 
c1 : b1
b2 : a2
c2 : b2
a1 : a2  // comparison started
a1 : b1  // it trying to skip values after it have been matched
b1 : b2
b1 : c1
c1 : c2
array_uintersect:["a1","b1","c1"]
iterations: 9

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.