List union algorithm in Python

Question

I'm trying to work on a list union algorithm now, with the following specifications: if an element in L1 occurs in L1 more than it occurs in L2, the union should return the maximum number of occurrences, i.e the amount it occurs in L1, with the roles of L1 and L2 switched if an element occurs in L2 more than it occurs in L1. If L1 and L2 are disjoint, the union just returns the regular set union. So far my thought process has been:

Iterate through L1.
Check if any element in L1 is also in L2.
If an element in L1 is also in L2, check which list has the greater count of the element.
If L1 and L2 are disjoint, return regular set union.
Repeat step 3 with L2 and L1 reversed.
Return the union.

I was thinking about using the max function to kind of tell Python to return the list where the multiplicity of each element in the union is the maximum number of occurrences of the element in both L1 and L2. Ideas?

I'm aware of the collections.Counter implementation. Transforming this into a dictionary will make this much easier. Good idea. — user2030052
– user2030052, Commented Feb 15, 2013 at 4:23
Do you need to know how Python solves this problem efficiently, or do you need to write an algorithm with only basic tools (homework, etc.)? — Eric O. Lebigot
– Eric O. Lebigot, Commented Feb 17, 2013 at 3:15

Eric O. Lebigot · Accepted Answer · 2013-02-15 10:28:17Z

4

This is a perfect job for the collections standard module, which offers multisets:

from collections import Counter

result_list = list((Counter(list1)|Counter(list2)).elements())

A Counter object represents here a multiset (set of generally more than 1 copy of its elements), the union operator | keeps the maximum count of each element, and elements() returns an iterator where each element is returned the number of times corresponding to its count.

If you don't really need a list but can work with a multiset in the code, then Counter(list1) | Counter(list2) is the union multiset that you need.

edited Feb 15, 2013 at 10:28

answered Feb 15, 2013 at 7:09

Eric O. Lebigot

95.1k49 gold badges223 silver badges263 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

John La Rooy Over a year ago

Cool. This is much better than my answer

Eric O. Lebigot · Accepted Answer · 2013-02-15 07:26:30Z

1

from collections import Counter

counts = Counter(L1)
for value, count in Counter(L2).items()
    counts[value] = max(counts[value], count)
newlist = [value for value, count in counts.items() for _ in range(count)]

edited Feb 15, 2013 at 7:26

Eric O. Lebigot

95.1k49 gold badges223 silver badges263 bronze badges

answered Feb 15, 2013 at 4:28

John La Rooy

306k54 gold badges378 silver badges514 bronze badges

3 Comments

Eric O. Lebigot Over a year ago

[value for value, count in counts.items() for _ in range(count)] could also be sum([value]*count for value, count in count.items(), []): it has the advantage of showing a tad more directly the structure of the final list (repetition of values).

John La Rooy Over a year ago

@EOL, sum is only efficient for numeric types. For lists it has quadratic performance (O(n**2)). list with .elements() as you used is probably the best way

Eric O. Lebigot Over a year ago

I agree: using sum() is only good for its slightly improved legibility. :) Your list comprehension has linear performance, which is way better.

Eric O. Lebigot · Accepted Answer · 2013-02-16 02:59:15Z

1

You can probably just use dicts with counts as values. Union logic is:

counts = {i: max(L1.get(i,0), L2.get(i,0)) for i in set(L1)|set(L2) }

The final list is

newlist = [value for value, count in counts.items() for _ in range(count)]

edited Feb 16, 2013 at 2:59

Eric O. Lebigot

95.1k49 gold badges223 silver badges263 bronze badges

answered Feb 15, 2013 at 22:06

ElKamina

7,82730 silver badges45 bronze badges

2 Comments

Eric O. Lebigot Over a year ago

I simplified the calculation of the union of keys. This is a good, "manual" way of obtaining the same thing as dict(Counter(L1)|Counter(L2)) (with from collections import Counter). However, the Counter class along with its union operator is meant to do just this, with the advantage that the code is more explicit.

Eric O. Lebigot Over a year ago

I also added the construction of the result list, since this is what the question asks for.

Ali-Akber Saifee · Accepted Answer · 2013-02-16 03:33:55Z

-1

a solution using only lists and their max/min properties could be

union = [] 
[union.extend([n] * max(l1.count(n), l2.count(n))) for n in range (min(min(l1),min(l2)), max(max(l1),max(l2))+1)]

edited Feb 16, 2013 at 3:33

answered Feb 16, 2013 at 3:23

Ali-Akber Saifee

4,6861 gold badge19 silver badges18 bronze badges

1 Comment

Eric O. Lebigot Over a year ago

This can be extremely inefficient, as this solution goes through all possible values (l1 = [0] and l2 = [2**1000]); also, count() requires each list to be gone through for each possible value, which also takes a relatively long time. This solution also supposes that the elements in the lists are integers: this is unnecessarily restrictive. Dictionary- or Counter-based solutions are both much faster and more general.

Collectives™ on Stack Overflow

List union algorithm in Python

4 Answers 4

1 Comment

3 Comments

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

3 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related