1

I'm trying to work on a list union algorithm now, with the following specifications: if an element in L1 occurs in L1 more than it occurs in L2, the union should return the maximum number of occurrences, i.e the amount it occurs in L1, with the roles of L1 and L2 switched if an element occurs in L2 more than it occurs in L1. If L1 and L2 are disjoint, the union just returns the regular set union. So far my thought process has been:

  1. Iterate through L1.
  2. Check if any element in L1 is also in L2.
  3. If an element in L1 is also in L2, check which list has the greater count of the element.
  4. If L1 and L2 are disjoint, return regular set union.
  5. Repeat step 3 with L2 and L1 reversed.
  6. Return the union.

I was thinking about using the max function to kind of tell Python to return the list where the multiplicity of each element in the union is the maximum number of occurrences of the element in both L1 and L2. Ideas?

5
  • Please post your current implementation. Commented Feb 15, 2013 at 4:21
  • 1
    Seems like a job for a collections.Counter ... Commented Feb 15, 2013 at 4:21
  • I'm aware of the collections.Counter implementation. Transforming this into a dictionary will make this much easier. Good idea. Commented Feb 15, 2013 at 4:23
  • Please also post your test cases with the expected output Commented Feb 15, 2013 at 4:23
  • Do you need to know how Python solves this problem efficiently, or do you need to write an algorithm with only basic tools (homework, etc.)? Commented Feb 17, 2013 at 3:15

4 Answers 4

4

This is a perfect job for the collections standard module, which offers multisets:

from collections import Counter

result_list = list((Counter(list1)|Counter(list2)).elements())

A Counter object represents here a multiset (set of generally more than 1 copy of its elements), the union operator | keeps the maximum count of each element, and elements() returns an iterator where each element is returned the number of times corresponding to its count.

If you don't really need a list but can work with a multiset in the code, then Counter(list1) | Counter(list2) is the union multiset that you need.

Sign up to request clarification or add additional context in comments.

1 Comment

Cool. This is much better than my answer
1
from collections import Counter

counts = Counter(L1)
for value, count in Counter(L2).items()
    counts[value] = max(counts[value], count)
newlist = [value for value, count in counts.items() for _ in range(count)]

3 Comments

[value for value, count in counts.items() for _ in range(count)] could also be sum([value]*count for value, count in count.items(), []): it has the advantage of showing a tad more directly the structure of the final list (repetition of values).
@EOL, sum is only efficient for numeric types. For lists it has quadratic performance (O(n**2)). list with .elements() as you used is probably the best way
I agree: using sum() is only good for its slightly improved legibility. :) Your list comprehension has linear performance, which is way better.
1

You can probably just use dicts with counts as values. Union logic is:

counts = {i: max(L1.get(i,0), L2.get(i,0)) for i in set(L1)|set(L2) }

The final list is

newlist = [value for value, count in counts.items() for _ in range(count)]

2 Comments

I simplified the calculation of the union of keys. This is a good, "manual" way of obtaining the same thing as dict(Counter(L1)|Counter(L2)) (with from collections import Counter). However, the Counter class along with its union operator is meant to do just this, with the advantage that the code is more explicit.
I also added the construction of the result list, since this is what the question asks for.
-1

a solution using only lists and their max/min properties could be

union = [] 
[union.extend([n] * max(l1.count(n), l2.count(n))) for n in range (min(min(l1),min(l2)), max(max(l1),max(l2))+1)]

1 Comment

This can be extremely inefficient, as this solution goes through all possible values (l1 = [0] and l2 = [2**1000]); also, count() requires each list to be gone through for each possible value, which also takes a relatively long time. This solution also supposes that the elements in the lists are integers: this is unnecessarily restrictive. Dictionary- or Counter-based solutions are both much faster and more general.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.