Vectorized method to find matching values between two columns

Question

I'm trying to locate the most recent rows within my Dataframe that contain the same values in two separate columns.

Presently, I am doing this slowly with looping, but I suspect there's a way to cleverly use the apply method or some other vectorized function to do this faster. My present code:

def enumerate_matching(df):
    a = list(df['A'])
    b = list(df['B'])
    matching = []

    for i in range(0, len(a)-1):
        for j in range(i+1, len(b)):
            if a[i] == b[j]:
                matching.append(i)
                matching.append(i+j)
                break
    return matching

Is there a faster method to do this?

Robin Nicole · Accepted Answer · 2018-12-14 20:37:02Z

0

you could use set to get the intersection (it has a complexity logarithmic in the size of the sets a and b)

 a = set(df['A'])
 b = set(df['B'])
 a.intersection(b)

edited Dec 14, 2018 at 20:37

answered Dec 14, 2018 at 20:22

Robin Nicole

5393 silver badges13 bronze badges

Add a comment |

Robin Nicole · Accepted Answer · 2019-01-13 23:21:02Z

0

If you want to do the matching line by line, you should do:

np.sum(df['A'] == df['B'])

answered Jan 13, 2019 at 23:21

Robin Nicole

5393 silver badges13 bronze badges

Add a comment |

Stack Exchange Network

Vectorized method to find matching values between two columns

2 Answers 2

Your Answer

Hot Network Questions

Vectorized method to find matching values between two columns

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions