How can I drop rows based if other another row respect some condition?

Question

Consider the dataframe df

   A  B  C   D  match?
0  x  y  1   1  true
1  x  y  1   2  false
2  x  y  2   1  false
3  x  y  2   2  true
4  x  y  3   4  false
5  x  y  5   6  false

I would like to drop the unmatched rows that are already matched somewhere else.

   A  B  C  D  match?
1  x  y  1  1  true
3  x  y  2  2  true
4  x  y  3  4  false
5  x  y  5  6  false

How can I do that with Pandas?

Nickil Maveli · Accepted Answer · 2017-01-21 06:54:01Z

3

You could sort those two columns so that their order of positioning could be made same throughout. Then, drop off all such duplicated entries present by providing keep=False in DF.drop_duplicates() method.

df[['C','D']] = np.sort(df[['C','D']].values)
df.drop_duplicates(keep=False)

edited Jan 21, 2017 at 6:54

answered Jan 20, 2017 at 14:48

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

benderv Over a year ago

This seems do to the trick, even though, you have to be careful because "C" and "D" value can be swap (if the D is greater than C, not the case here)

Nickil Maveli Over a year ago

Yeah, that's why I had to sort them before so that they're uniform throughout.

piRSquared · Accepted Answer · 2017-01-20 14:55:35Z

2

you can compare the two columns with

df.C == df.D

0     True
1    False
2    False
3     True
4    False
dtype: bool

Then shift the series down.

0      NaN
1     True
2    False
3    False
4     True
dtype: object

Each True value indicates the start of a new group. We can use cumsum to create the groupings we need for groupby

(df.C == df.D).shift().fillna(False).cumsum()

0    0
1    1
2    1
3    1
4    2
dtype: int64

Then use groupy + last

df.groupby(df.C.eq(df.D).shift().fillna(False).cumsum()).last()

   A  B  C  D
0  x  y  1  1
1  x  y  2  2
2  x  y  3  4

edited Jan 20, 2017 at 14:55

answered Jan 20, 2017 at 14:05

piRSquared

296k68 gold badges509 silver badges654 bronze badges

6 Comments

benderv Over a year ago

Your solution makes assumption on the DataFrame values.

piRSquared Over a year ago

@fast_cen what assumption would that be?

benderv Over a year ago

I'm updating the question with a more complete dataframe. Thanks for the help though !

benderv Over a year ago

If two "unmatched" rows follow each others, you consider them as one group.

piRSquared Over a year ago

@fast_cen, you mean at the end? Yes... that's true. I'll update my answer.

|

Sacha Vakili · Accepted Answer · 2017-01-20 14:45:22Z

0

If you would like to remove the rows where "C" and "D" matched, the method .ix will help you:

df = df.ix[(df['C'] != df['D'])]

Therefore, df['C'] != df['D'] generates a list of booleans and .ix allows you to extract the corresponding DataFrame :)

answered Jan 20, 2017 at 14:45

Sacha Vakili

666 bronze badges

Collectives™ on Stack Overflow

How can I drop rows based if other another row respect some condition?

3 Answers 3

2 Comments

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related