Trying to understand array_uintersect behavior

Question

Let's continue. Why array_uintersect does not compare values of first array after sorting? By my humble opinion array_udiff and array_uintersect should have similar algorithms, but they have not. Why?

$compare = function($a, $b) use(&$iteration_count)
    {
    echo("$a : $b\n");
    $iteration_count++;
    return strcmp($a, $b);
    };

$a = array('a', 'b', 'c');
$b = array('x', 'y', 'z');

$iteration_count = 0;
echo "array_udiff:" . json_encode(array_udiff($a, $b, $compare)) . "\n";
echo "iterations: $iteration_count\n\n";

$iteration_count = 0;
echo "array_uintersect:" . json_encode(array_uintersect($a, $b, $compare)) . "\n";
echo "iterations: $iteration_count\n\n";

Output

b : a
c : b
y : x
z : y
a : x
a : b
b : x
b : c
c : x
array_udiff:["a","b","c"]
iterations: 9

b : a
c : b
y : x
z : y
a : x  // comparison started
b : x  // but there is no comparison to skip values
c : x
array_uintersect:[]
iterations: 7

Ja͢ck · Accepted Answer · 2015-03-05 09:34:07Z

The algorithm that's adopted by array_intersect() and friends is to first assume all values of the first array are present in the other arrays; during iteration it will remove elements that fail this assertion.

When an element is not found in one of the other arrays, the implementation can do two things:

Compare the next element against the current element and remove from the final result while they're equal (this is what diff does)
Compare the next element against the last inspected element of the other arrays and keep removing it from the final result until it's bigger (or the end of the array is reached).

In the case of PHP, the latter is chosen. This has a slight advantage, because it can skip values bigger than the current element but smaller than the last inspected element of the other arrays. For example:

$a = ['a.a0', 'a.a1', 'b.a2', 'c.a3'];
$b = ['a.c0', 'd.c1'];

function cmp_val($a, $b)
{
    echo "$a <=> $b\n";
    return strcmp($a[0], $b[0]);
}

print_r(array_uintersect($a, $b, 'cmp_val'));

Output:

...
-- intersect starts
a.a0 <=> a.c0
a.a0 <=> a.a1 <-- match
a.a1 <=> b.a2
b.a2 <=> d.c1 <-- no match
c.a3 <=> d.c1

As you can see, it uses the strategy of comparing within the first array if a value occurs in all other arrays, just like diff does; and the other strategy is used if the value isn't present in any of the other arrays.

kuroi neko · Accepted Answer · 2015-03-05 03:24:44Z

0

The initial sorting is in O(n log(n)) for each array, so optimizing the subsequent O(n) elimination pass would yield little gain.

This might be slower on 3 elements arrays, but will become neglectible for bigger sizes.

answered Mar 5, 2015 at 3:24

kuroi neko

8,6681 gold badge23 silver badges45 bronze badges

Comments

sectus · Accepted Answer · 2015-03-05 07:12:44Z

0

I was right and wrong at the same time. Those functions have similar algorithms.

$compare = function($a, $b) use(&$iteration_count)
    {
    echo("$a : $b\n");
    $iteration_count++;
    return strcmp($a[0], $b[0]);
    };

$a = array('a1', 'b1', 'c1');
$b = array('a2', 'b2', 'c2');

$iteration_count = 0;
echo "array_uintersect:" . json_encode(array_uintersect($a, $b, $compare)) . "\n";
echo "iterations: $iteration_count\n\n";

Output

b1 : a1 
c1 : b1
b2 : a2
c2 : b2
a1 : a2  // comparison started
a1 : b1  // it trying to skip values after it have been matched
b1 : b2
b1 : c1
c1 : c2
array_uintersect:["a1","b1","c1"]
iterations: 9

edited Mar 5, 2015 at 7:12

answered Mar 5, 2015 at 7:06

sectus

15.7k5 gold badges59 silver badges97 bronze badges

Collectives™ on Stack Overflow

Trying to understand array_uintersect behavior

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related