Python - duplicates in several lists

Question

I have a code which gives back 10 lists of numbers.

def my_random_list(l: list):
    return sorted(random.sample(list(set(l)), 6))


for _ in range(10):
    print(sorted(my_random_list([i for i in range(1, 43)])))

I need to count how many duplicates are there in this 10 lists. How to do it in short and efficient way?

How about use collections.Counter. Counter is bag data structure of python. If you put all your lists into a Counter, you can get what elements are duplicate(greater than 2) and how many overlap. — Boseong Choi
– Boseong Choi, Commented Mar 1, 2020 at 17:16
@PedroLobito I'm just trying to get my python skills solving interesting tasks. — DoctorEXE
– DoctorEXE, Commented Mar 1, 2020 at 18:09

Pedro Lobito · Accepted Answer · 2020-03-01 17:34:36Z

1

You can use:

import random
from collections import defaultdict

def my_random_list(l: list):
    return sorted(random.sample(list(set(l)), 6))

repeated = defaultdict(int)
for _ in range(10):
    rl = my_random_list([i for i in range(1, 43)])
    for x in rl:
        repeated[x] += 1
    print(sorted(rl))

repeated = {k:v for k,v in repeated.items() if v > 1}
print(repeated)
# {2: 2, 5: 3, 19: 4, 21: 4, 4: 3, 8: 2, 14: 2, 38: 3, 9: 3, 24: 2, 40: 3, 42: 2, 10: 2, 22: 3, 32: 2, 18: 3, 34: 2, 30: 2, 31: 3}
print(len(repeated.keys())) # how many duplicates

Demo

edited Mar 1, 2020 at 17:34

answered Mar 1, 2020 at 17:29

Pedro Lobito

99.8k36 gold badges274 silver badges278 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

lucidbrot · Accepted Answer · 2020-03-01 17:16:14Z

1

Convert the list to a set, which automatically gets rid of duplicates. Then compare their size:

l = [1,2,3,4,5,6,7,7,6,5,4]
print(len(l) - len(set(l)))

answered Mar 1, 2020 at 17:16

lucidbrot

6,5523 gold badges50 silver badges77 bronze badges

Comments

Kamehameha · Accepted Answer · 2020-03-01 17:29:09Z

If your intention is to find out the duplicates across the 10 lists, you can try the following -

# Import Counter from collections 
In [11]: from collections import Counter

# Your definition of my_random_list
In [12]: def my_random_list(l: list):
    ...:     return sorted(random.sample(list(set(l)), 6))
    ...:

# Copying your version of creating 10 lists into a lists variable (calling the sorted() here is superfluous in my opinion)
In [13]: lists = [sorted(my_random_list([i for i in range(1, 43)])) for _ in range(10)]

# Count all the entries across all the 10 lists
In [14]: counter = Counter([])

# You can add multiple Counter instances to produce a "merged" Counter
In [15]: for l in lists:
    ...:     counter += Counter(l)

# Find the entries whose value exists more than once
In [16]: duplicates = [k for k,v in counter.items() if v > 1]

# Printing all the duplicate entries across the lists
In [17]: duplicates
Out[17]: [6, 16, 20, 37, 38, 2, 9, 29, 1, 18, 33, 3, 17, 19, 31, 15, 21, 42, 41, 11]

# Length of the duplicate list
In [18]: len(duplicates)
Out[18]: 20

You can read-up on Counter here

mathfux · Accepted Answer · 2020-03-01 17:35:46Z

1

A statement of problem is not clear, I assume you want to calculate duplicates in concatenation of these 10 arrays. In this case you could use advantages of numpy.unique:

import random
import numpy as np
collection = [my_random_list(list(range(1, 43))) for i in range(10)]
conc = np.concatenate(collection) # concatenated list
items, cnt = np.unique(conc, return_counts=True) # sorted set of unique items and their counts
output = items[cnt>1] # items that appears more than once

answered Mar 1, 2020 at 17:35

mathfux

5,9792 gold badges21 silver badges38 bronze badges

Comments

Boseong Choi · Accepted Answer · 2020-03-01 17:36:56Z

collections.Counter and itertools.chain will be helpful.

import random

source = [i for i in range(1, 43)]


def my_random_list():
    return sorted(random.sample(source, 6))


random_lists = [my_random_list() for _ in range(10)]
print(random_lists)

Here are 10 random lists(6 length for each).

>>> [[2, 4, 10, 18, 20, 30], [4, 12, 13, 19, 21, 27], [10, 11, 18, 26, 32, 33], [4, 11, 12, 17, 38, 42], [12, 22, 28, 38, 40, 41], [2, 11, 22, 30, 35, 36], [4, 6, 22, 24, 32, 34], [1, 3, 5, 25, 31, 33], [25, 29, 31, 32, 33, 35], [12, 16, 28, 31, 37, 41]]

Then you can count it.

from collections import Counter
from itertools import chain


counter = Counter(chain(*random_lists))
print(counter)

>>> Counter({4: 4, 12: 4, 11: 3, 32: 3, 33: 3, 22: 3, 31: 3, 2: 2, 10: 2, 18: 2, 30: 2, 38: 2, 28: 2, 41: 2, 35: 2, 25: 2, 20: 1, 13: 1, 19: 1, 21: 1, 27: 1, 26: 1, 17: 1, 42: 1, 40: 1, 36: 1, 6: 1, 24: 1, 34: 1, 1: 1, 3: 1, 5: 1, 29: 1, 16: 1, 37: 1})

And filter the counter with comprehension.

results = [k for k, v in counter.items() if v >= 2]
print(results)

>>> [2, 4, 10, 18, 30, 12, 11, 32, 33, 38, 22, 28, 41, 35, 25, 31]

Collectives™ on Stack Overflow

Python - duplicates in several lists

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related