2

I've read loads of examples but not quite finding what i'm looking for. Tried several ways of doing this but looking for the best.

So the idea is that given:

s1 = ['a','b','c']
s2 = ['a','potato','d']
s3 = ['a','b','h']
strings=[s1,s2,s3]

the results should be:

['c']
['potato','d']
['h']

because these items are unique across the whole list of lists.

Thank you for any suggestions :)

3
  • stackoverflow.com/questions/2213923/… -already answered pls check this link.... Commented Apr 2, 2020 at 11:30
  • @LakshmiRam Not the same Commented Apr 2, 2020 at 11:32
  • What should happen if s1 = ['a', 'b', 'c', 'c']? Commented Apr 2, 2020 at 13:57

5 Answers 5

3

As a general approach you can keep a counter of all items and then keep those that have appeared only once.

In [21]: from collections import Counter 

In [23]: counts = Counter(s1 + s2 + s3)                                                                                                                                                                     

In [24]: [i for i in s1 if counts[i] == 1]                                                                                                                                                                  
Out[24]: ['c']

In [25]: [i for i in s2 if counts[i] == 1]                                                                                                                                                                  
Out[25]: ['potato', 'd']

In [26]: [i for i in s3 if counts[i] == 1]                                                                                                                                                                  
Out[26]: ['h']

And if you have a nested list you can do the following:

In [28]: s = [s1, s2, s3]                                                                                                                                                                                   

In [30]: from itertools import chain                                                                                                                                                                        

In [31]: counts = Counter(chain.from_iterable(s))                                                                                                                                                           

In [32]: [[i for i in lst if counts[i] == 1] for lst in s]                                                                                                                                                  
Out[32]: [['c'], ['potato', 'd'], ['h']]
Sign up to request clarification or add additional context in comments.

1 Comment

What a beautiful and elegant solution. I'm going to replace my own function for removing duplicates with this. Thank you.
1

How about:

[i for i in s1 if i not in s2+s3] #gives ['c']
[j for j in s2 if j not in s1+s3] #gives ['potato', 'd']
[k for k in s3 if k not in s1+s2] #gives ['h']

If you want all of them in a list:

uniq = [[i for i in s1 if i not in s2+s3],
[j for j in s2 if j not in s1+s3],
[k for k in s3 if k not in s1+s2]]

#output
[['c'], ['potato', 'd'], ['h']]

Comments

1

To find out the unique elements across the 3 lists you can use the set Symmetric difference(^) operation along with union(|) operation since you have 3 lists.

>>> s1 = ['a','b','c']
>>> s2 = ['a','potato','d']
>>> s3 = ['a','b','h']

>>> (set(s1) | (set(s2)) ^ set(s3)

2 Comments

This doesn't work because symmetric_difference will return values that are present an odd number of times (e.g. 'a')
if we use union along with the symmetric_difference its possible.
1

Counter (from collections) is the way to go for this:

from collections import Counter

s1 = ['a','b','c']
s2 = ['a','potato','d']
s3 = ['a','b','h']
strings=[s1,s2,s3]

counts  = Counter(s for sList in strings for s in sList)
uniques = [ [s for s in sList if counts[s]==1] for sList in strings ]

print(uniques) # [['c'], ['potato', 'd'], ['h']]

If you're not allowed to use an imported module, you could do it with the list's count() method but it would be much less efficient:

allStrings = [ s for sList in strings for s in sList ]
unique     = [[ s for s in sList if allStrings.count(s)==1] for sList in strings]

This can be made more efficient using a set to identify repeated values:

allStrings = ( s for sList in strings for s in sList )
seen       = set()
repeated   = set( s for s in allStrings if s in seen or seen.add(s))
unique     = [ [ s for s in sList if s not in repeated] for sList in strings ]

Comments

1

Assuming that you want this to work for an arbitrary number of sequences, a direct (but likely not the most efficient -- probably the others object can be constructed from the last iteration) way to solve this would be:

def deep_unique_set(*seqs):
    for i, seq in enumerate(seqs):
        others = set(x for seq_ in (seqs[:i] + seqs[i + 1:]) for x in seq_)
        yield [x for x in seq if x not in others]

or the slightly faster but less memory efficient and otherwise identical:

def deep_unique_preset(*seqs):
    pile = list(x for seq in seqs for x in seq)
    k = 0
    for seq in seqs:
        num = len(seq)
        others = set(pile[:k] + pile[k + num:])
        yield [x for x in seq if x not in others]
        k += num

Testing it with the provided input:

s1 = ['a', 'b', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']

print(list(deep_unique_set(s1, s2, s3)))
# [['c'], ['potato', 'd'], ['h']]
print(list(deep_unique_preset(s1, s2, s3)))
# [['c'], ['potato', 'd'], ['h']]

Note that if the input contain duplicates within one of the lists, they are not removed, i.e.:

s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']

print(list(deep_unique_set(s1, s2, s3)))
# [['c', 'c'], ['potato', 'd'], ['h']]
print(list(deep_unique_preset(s1, s2, s3)))
# [['c', 'c'], ['potato', 'd'], ['h']]

If all duplicates should be removed, a better approach is to count the values. The method of choice for this is by using collections.Counter, as proposed in @Kasramvd answer:

def deep_unique_counter(*seqs):
    counts = collections.Counter(itertools.chain.from_iterable(seqs))
    for seq in seqs:
        yield [x for x in seq if counts[x] == 1]
s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']
print(list(deep_unique_counter(s1, s2, s3)))
# [[], ['potato', 'd'], ['h']]

Alternatively, one could keep track of repeats, e.g.:

def deep_unique_repeat(*seqs):
    seen = set()
    repeated = set(x for seq in seqs for x in seq if x in seen or seen.add(x))
    for seq in seqs:
        yield [x for x in seq if x not in repeated]

which will have the same behavior as the collections.Counter-based approach:

s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']
print(list(deep_unique_repeat(s1, s2, s3)))
# [[], ['potato', 'd'], ['h']]

but is slightly faster, since it does not need to keep track of unused counts.

Another, highly inefficient, make use of list.count() for counting instead of a global counter:

def deep_unique_count(*seqs):
    pile = list(x for seq in seqs for x in seq)
    for seq in seqs:
        yield [x for x in seq if pile.count(x) == 1]

These last two approaches are also proposed in @AlainT. answer.


Some timings for these are provided below:

n = 100
m = 100
s = tuple([random.randint(0, 10 * n * m) for _ in range(n)] for _ in range(m))
for func in funcs:
    print(func.__name__)
    %timeit list(func(*s))
    print()

# deep_unique_set
# 10 loops, best of 3: 86.2 ms per loop

# deep_unique_preset
# 10 loops, best of 3: 57.3 ms per loop

# deep_unique_count
# 1 loop, best of 3: 1.76 s per loop

# deep_unique_repeat
# 1000 loops, best of 3: 1.87 ms per loop

# deep_unique_counter
# 100 loops, best of 3: 2.32 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.