0

I have a array which stores a object. I am trying to see if there are duplicate values in this object array, but only on one of the objects parameters (hexdigest).

How can I check for duplicates and record the entire object of duplicates I find?

# class to store hashes
class customclass: 
    def __init__(self, value, hexdigest): 
        self.value = value 
        self.hexdigest = hexdigest

# array to store class data
hash_array = []
hash_array.append(customclass(value=299, hexdigest='927'))
hash_array.append(customclass(value=207, hexdigest='92b'))
hash_array.append(customclass(value=113, hexdigest='951'))
hash_array.append(customclass(value=187, hexdigest='951'))
hash_array.append(customclass(value=205, hexdigest='998'))

# sort array
sorted_array = sorted(hash_array, key=attrgetter('hexdigest'))

# check for duplicate hexdigest's
newlist = []
duplist = []
for obj in sorted_array:
    for jbo in newlist:
        if obj.hexdigest not in jbo:
            newlist.append(obj)
        else:
            duplist.append(obj) 

2 Answers 2

1
hex_list = []
duplist = []
for obj in sorted_array:
    if(obj.hexdigest in hex_list):
        duplist.append(obj)
    else:
        hex_list.append(obj.hexdigest)        

use this above block of code instead of the below one which you have implemented to find the list of duplicate object

newlist = []
duplist = []
for obj in sorted_array:
    for jbo in newlist:
        if obj.hexdigest not in jbo:
            newlist.append(obj)
        else:
            duplist.append(obj) 
Sign up to request clarification or add additional context in comments.

Comments

0

Well, newlist is empty, so the inner for loop never runs, so nothing gets appended to newlist or duplist.

You may wish to group by the hexdigest attribute using itertools.groupby and a dictionary comprehension.

from operator import attrgetter
from itertools import groupby

class customclass: 
    def __init__(self, value, hexdigest): 
        self.value = value 
        self.hexdigest = hexdigest

hash_array = []
hash_array.append(customclass(value=299, hexdigest='927'))
hash_array.append(customclass(value=207, hexdigest='92b'))
hash_array.append(customclass(value=113, hexdigest='951'))
hash_array.append(customclass(value=187, hexdigest='951'))
hash_array.append(customclass(value=205, hexdigest='998'))

sorted_array = sorted(hash_array, key=attrgetter('hexdigest'))
# [<__main__.customclass object at 0x7f488d1a2a20>, 
#  <__main__.customclass object at 0x7f488d1a29b0>, 
#  <__main__.customclass object at 0x7f488d1a2b00>, 
#  <__main__.customclass object at 0x7f488d1a2b70>, 
#  <__main__.customclass object at 0x7f488d1a2c18>]

groups = groupby(sorted_array, key=attrgetter('hexdigest'))

{k: list(v) for k, v in groups}
# {'927': [<__main__.customclass object at 0x7f488d1a2a20>], 
#  '92b': [<__main__.customclass object at 0x7f488d1a29b0>], 
#  '951': [<__main__.customclass object at 0x7f488d1a2b00>,  
#          <__main__.customclass object at 0x7f488d1a2b70>], 
#  '998': [<__main__.customclass object at 0x7f488d1a2c18>]}

From there it's relatively easy to retrieve the unique and duplicate values.

It may be easier to visualize what's going on if you provide a more useful definition for __repr__.

class customclass: 
    def __init__(self, value, hexdigest): 
        self.value = value 
        self.hexdigest = hexdigest
    def __repr__(self):
        return f"<customclass value: {self.value}, hexdigest: {self.hexdigest}>"

Doing so, hash_array prints in the interactive interpreter as follows, with the exception of he newlines I added for sanity's sake.

[<customclass value: 299, hexdigest: 927>, 
 <customclass value: 207, hexdigest: 92b>, 
 <customclass value: 113, hexdigest: 951>, 
 <customclass value: 187, hexdigest: 951>, 
 <customclass value: 205, hexdigest: 998>]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.