4

I am trying to efficiently remove duplicates in Pandas in which duplicates are inverted across two columns. For example, in this data frame:

import pandas as pd
key = pd.DataFrame({'p1':['a','b','a','a','b','d','c'],'p2':['b','a','c','d','c','a','b'],'value':[1,1,2,3,5,3,5]})
df = pd.DataFrame(key,columns=['p1','p2','value'])
print frame

       p1 p2 value
    0  a  b    1
    1  b  a    1
    2  a  c    2
    3  a  d    3
    4  b  c    5
    5  d  a    3
    6  c  b    5

I would want to remove rows 1, 5 and 6, leaving me with just:

      p1 p2 value
    0  a  b    1
    2  a  c    2
    3  a  d    3
    4  b  c    5

Thanks in advance for ideas on how to do this.

1 Answer 1

14

Reorder the p1 and p2 values so they appear in a canonical order:

mask = df['p1'] < df['p2']
df['first'] = df['p1'].where(mask, df['p2'])
df['second'] = df['p2'].where(mask, df['p1'])

yields

In [149]: df
Out[149]: 
  p1 p2  value first second
0  a  b      1     a      b
1  b  a      1     a      b
2  a  c      2     a      c
3  a  d      3     a      d
4  b  c      5     b      c
5  d  a      3     a      d
6  c  b      5     b      c

Then you can drop_duplicates:

df = df.drop_duplicates(subset=['value', 'first', 'second'])

import pandas as pd
key = pd.DataFrame({'p1':['a','b','a','a','b','d','c'],'p2':['b','a','c','d','c','a','b'],'value':[1,1,2,3,5,3,5]})
df = pd.DataFrame(key,columns=['p1','p2','value'])

mask = df['p1'] < df['p2']
df['first'] = df['p1'].where(mask, df['p2'])
df['second'] = df['p2'].where(mask, df['p1'])
df = df.drop_duplicates(subset=['value', 'first', 'second'])
df = df[['p1', 'p2', 'value']]

yields

In [151]: df
Out[151]: 
  p1 p2  value
0  a  b      1
2  a  c      2
3  a  d      3
4  b  c      5
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.