0

I'm kinda new to programming and I want to compare two lists of lists in python, while the floats in these lists may have an error in it. Here an example:

first_list = [['ATOM', 'N', 'SER', -1.081, -16.465,  17.224], 
              ['ATOM', 'C', 'SER', 2.805, -3.504,  6.222], 
              ['ATOM', 'O', 'SER', -17.749, 16.241,  -1.333]]

secnd_list = [['ATOM', 'N', 'SER', -1.082, -16.465,  17.227],
              ['ATOM', 'C', 'SER', 2.142, -3.914,  6.222], 
              ['ATOM', 'O', 'SER', -17.541, -16.241,  -1.334]]

Expected Output:

Differences = ['ATOM', 'C', 'SER', 2.805, -3.504,  6.222]

So far my tryings:

def aprox (x, y):
    if x == float and y == float:
        delta = 0.2 >= abs(x - y)
        return delta
    else: rest = x, y
    return rest

def compare (data1, data2):
    diff = [x for x,y in first_list if x not in secnd_list and aprox(x,y)] + [x for x,y in secnd_list if x not in first_list and aprox(x,y)]
    return diff

Or with the help of tuples, but there I dont know how to build in the approximation:

def compare (data1, data2):
    first_set = set(map(tuple, data1))
    secnd_set = set(map(tuple, data2))
    diff = first_set.symmetric_difference(secnd_set)
    return diff

Hope you can help me! :)

4
  • Your initial compare function has parameters data1 and data2, but then you reference the (global?) objects first_list and secnd_list and never use the parameters. Commented Mar 27, 2017 at 15:42
  • Try this stackoverflow.com/questions/6105777/… Commented Mar 27, 2017 at 15:43
  • fyi isinstance(x, float) is how you should check number type Commented Mar 27, 2017 at 15:43
  • Your expected output is wrong, it would be 2 rows, since there is 2 rows where the discrepancies are greater than 0.2 according to your code. Commented Mar 27, 2017 at 16:09

3 Answers 3

5

The line

if x == float and y == float

is inaccurate... The proper way to check the type of the variable is to use the type() function... Try replacing the above line with

if type(x) is float and type(y) is float:
Sign up to request clarification or add additional context in comments.

Comments

0

This is kind of clunky but I did it on the fly and it should get you the desired results. As I mentioned in your code you set the threshold at 0.2 which means two rows should be returned, not one like you mentioned.

def discrepancies(x, y):
    for _, (row1, row2) in enumerate(zip(x, y)):
        for _, (item1, item2) in enumerate(zip(row1[3:],row2[3:])):
            if abs(item1 - item2) >= 0.2:
                print row1
                break

discrepancies(first_list, secnd_list)
['ATOM', 'C', 'SER', 2.805, -3.504, 6.222]
['ATOM', 'O', 'SER', -17.749, 16.241, -1.333]

Couple caveats, this will get considerably slower as each for loop adds O(n) and for larger lists within your lists I would use the itertools.izip function I believe it is called. Hope this helps!

Comments

0

May be you can iterate through each element in of both and followed by comparison of sub-elements: Then, when any sub elements not equal, it can be added to results depending on it's type i.e. if two strings are not equal, it can be added to results or if it is float and math.isclose() can be used for approximation:

Note: Correction was made to match the expected output, there is missing negative sign in third element of first_list

import math

first_list = [['ATOM', 'N', 'SER', -1.081, -16.465,  17.224], 
              ['ATOM', 'C', 'SER', 2.805, -3.504,  6.222], 
              ['ATOM', 'O', 'SER', -17.749, -16.241,  -1.333]] # changes made

secnd_list = [['ATOM', 'N', 'SER', -1.082, -16.465,  17.227],
              ['ATOM', 'C', 'SER', 2.142, -3.914,  6.222], 
              ['ATOM', 'O', 'SER', -17.541, -16.241,  -1.334]]

diff = []
for e1, e2 in zip(first_list, secnd_list):
    for e_sub1, e_sub2 in zip(e1, e2):
        # if sub-elements are not equal
        if e_sub1 != e_sub2:
            # if it is string and not equal
            if isinstance(e_sub1, str):
                diff.append(e1)
                break # one element not equal so no need to iterate other sub-elements
            else:  # is float and not equal
                # Comparison made to 0.2
                if not math.isclose(e_sub1, e_sub2, rel_tol=2e-1):
                    diff.append(e1)
                    break # one element not equal so no need to iterate other sub-elements
diff

Output:

[['ATOM', 'C', 'SER', 2.805, -3.504, 6.222]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.