2

Consider the dataframe df

   A  B  C   D  match?
0  x  y  1   1  true
1  x  y  1   2  false
2  x  y  2   1  false
3  x  y  2   2  true
4  x  y  3   4  false
5  x  y  5   6  false

I would like to drop the unmatched rows that are already matched somewhere else.

   A  B  C  D  match?
1  x  y  1  1  true
3  x  y  2  2  true
4  x  y  3  4  false
5  x  y  5  6  false

How can I do that with Pandas?

0

3 Answers 3

3

You could sort those two columns so that their order of positioning could be made same throughout. Then, drop off all such duplicated entries present by providing keep=False in DF.drop_duplicates() method.

df[['C','D']] = np.sort(df[['C','D']].values)
df.drop_duplicates(keep=False)

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

This seems do to the trick, even though, you have to be careful because "C" and "D" value can be swap (if the D is greater than C, not the case here)
Yeah, that's why I had to sort them before so that they're uniform throughout.
2

you can compare the two columns with

df.C == df.D

0     True
1    False
2    False
3     True
4    False
dtype: bool

Then shift the series down.

0      NaN
1     True
2    False
3    False
4     True
dtype: object

Each True value indicates the start of a new group. We can use cumsum to create the groupings we need for groupby

(df.C == df.D).shift().fillna(False).cumsum()

0    0
1    1
2    1
3    1
4    2
dtype: int64

Then use groupy + last

df.groupby(df.C.eq(df.D).shift().fillna(False).cumsum()).last()

   A  B  C  D
0  x  y  1  1
1  x  y  2  2
2  x  y  3  4

6 Comments

Your solution makes assumption on the DataFrame values.
@fast_cen what assumption would that be?
I'm updating the question with a more complete dataframe. Thanks for the help though !
If two "unmatched" rows follow each others, you consider them as one group.
@fast_cen, you mean at the end? Yes... that's true. I'll update my answer.
|
0

If you would like to remove the rows where "C" and "D" matched, the method .ix will help you:

df = df.ix[(df['C'] != df['D'])]

Therefore, df['C'] != df['D'] generates a list of booleans and .ix allows you to extract the corresponding DataFrame :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.